Issue

I have a set of biological count data within a data frame in R which has 200,000 entries. I am looking to write a function that will identify the peaks within the count data. By peaks, I want the top 50 count data. I am expecting there to be multiple peaks within this dataset as the median value is 0. When inputting:

> summary(df$V3)

My output looks like this:

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
    0.00     0.00     0.00     1.82     1.00 94746.00

I am wanting to write a function that will list the peaks and then look at the numbers on either side of the peaks (+1 and -1) to produce a ratio. Can anyone help with this?

My dataframe looks like this and is labelled df:

V1    V2    V3   
gene  1     6
gene  2     0
gene  3     0
gene  4     10
....

My expected output would be a data frame identifying the peaks, and at what position (V2) within this dataset so I can examine the numbers on either side of the peaks to produce a ratio for analysis.

Solution

This is a crude way of doing this, this will give you values on either side of the peak, where you can make a ratio.

Here I considered the peaks as any value higher than the mean.

library(tidyverse)

"V1    V2    V3
gene  1     6
gene  2     0
gene  3     0
gene  4     10
gene  5     1" %>% 
  read_table() -> df

mean <- 1.82

df %>% 
  filter(V3 > mean) %>% 
  pull(V2) -> ids


df %>% 
  mutate(minus_peaks = lead(V3),
         plus_peaks = lag(V3)) %>% 
  filter(V2 %in% ids)

# A tibble: 2 × 5
  V1       V2    V3 minus_peaks plus_peaks
  <chr> <dbl> <dbl>       <dbl>      <dbl>
1 gene      1     6           0         NA
2 gene      4    10           1          0

Answered By - Mohan Govindasamy

Answer Checked By - Marilyn (PHPFixing Volunteer)

Thursday, October 6, 2022

[FIXED] How to write a function in R which will identify peaks within a dataframe

Issue

Solution

No comments:

Post a Comment