Issue

I read some articles here on how to filter based on specific values in a given column. However, what I am interested in is whether I can filter randomly selected unique values of a column. To better understand my question, please consider the following sample dataframe:

MeasurementPoint <- c(1,2,1,2,3,3,4,4,6,7,6,7)
subject <- c(1,1,1,1,2,2,3,3,4,4,4,4)
MeasurementMethod <- c("A","A", "B", "B", "A","B", "A","B","A","A", "B","B")
value <- c(-0.06, 0.11,-0.11,-0.01.-0.13, 0.02, -0.08, 0.09, 0.05, 0.04, -0.03, -0.02)
df1 <- data.frame(MeasurementPoint, subject,MeasurementMethod, value)
df1

 MeasurementPoint subject MeasurementMethod value
         1            1            A        -0.06
         2            1            A         0.11
         1            1            B        -0.11
         2            1            B        -0.01
         3            2            A        -0.13
         3            2            B         0.02
         4            3            A        -0.08
         4            3            B         0.09
         6            4            A         0.05
         7            4            A         0.04
         6            4            B        -0.03
         7            4            B        -0.02

Some values are measured on different subjects with two different MeasurementMethod and on different MeasurementPoints, e.g. multiple spots on their body.

Some subjects have more than one MeasurementPoints like subject #1 and #4. The rest have only one MeasurementPoint on their bodies, and only the MeasurementMethod varies for them (subject #2 and #3).

I would like to filter only one MeasurementPoint per subject and leave the rest. This selection should be "randomly" done. And as an example the follwoing dataframe would be an outcome of interest:

  MeasurementPoint subject MeasurementMethod value
                2       1                 A  0.11
                2       1                 B -0.01
                3       2                 A -0.13
                3       2                 B  0.02
                4       3                 A -0.08
                4       3                 B  0.09
                6       4                 A  0.05
                6       4                 B -0.03

Please note that the selection of MeasurementPoint = 2 for the first subject and MeasurementPoint = 6 for the last subject should happen randomly.

Solution

We can group_by the subject column, and filter rows that match the random MeasurementPoint value generated by sample.

library(dplyr)

df1 %>% 
  group_by(subject) %>% 
  filter(MeasurementPoint == sample(MeasurementPoint, 1))

# A tibble: 8 × 4
# Groups:   subject [4]
  MeasurementPoint subject MeasurementMethod value
             <dbl>   <dbl> <chr>             <dbl>
1                1       1 A                 -0.06
2                1       1 B                 -0.11
3                3       2 A                 -0.13
4                3       2 B                  0.02
5                4       3 A                 -0.08
6                4       3 B                  0.09
7                6       4 A                  0.05
8                6       4 B                 -0.03

Answered By - benson23

Answer Checked By - Robin (PHPFixing Admin)

Friday, October 7, 2022

[FIXED] How can I filter a dataframe based on (randomly selected) unique values of a column?

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Friday, October 7, 2022

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To