PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, October 7, 2022

[FIXED] How can I filter a dataframe based on (randomly selected) unique values of a column?

 October 07, 2022     dataframe, dplyr, r, random, statistics     No comments   

Issue

I read some articles here on how to filter based on specific values in a given column. However, what I am interested in is whether I can filter randomly selected unique values of a column. To better understand my question, please consider the following sample dataframe:

MeasurementPoint <- c(1,2,1,2,3,3,4,4,6,7,6,7)
subject <- c(1,1,1,1,2,2,3,3,4,4,4,4)
MeasurementMethod <- c("A","A", "B", "B", "A","B", "A","B","A","A", "B","B")
value <- c(-0.06, 0.11,-0.11,-0.01.-0.13, 0.02, -0.08, 0.09, 0.05, 0.04, -0.03, -0.02)
df1 <- data.frame(MeasurementPoint, subject,MeasurementMethod, value)
df1
 MeasurementPoint subject MeasurementMethod value
         1            1            A        -0.06
         2            1            A         0.11
         1            1            B        -0.11
         2            1            B        -0.01
         3            2            A        -0.13
         3            2            B         0.02
         4            3            A        -0.08
         4            3            B         0.09
         6            4            A         0.05
         7            4            A         0.04
         6            4            B        -0.03
         7            4            B        -0.02

Some values are measured on different subjects with two different MeasurementMethod and on different MeasurementPoints, e.g. multiple spots on their body.

Some subjects have more than one MeasurementPoints like subject #1 and #4. The rest have only one MeasurementPoint on their bodies, and only the MeasurementMethod varies for them (subject #2 and #3).

I would like to filter only one MeasurementPoint per subject and leave the rest. This selection should be "randomly" done. And as an example the follwoing dataframe would be an outcome of interest:

  MeasurementPoint subject MeasurementMethod value
                2       1                 A  0.11
                2       1                 B -0.01
                3       2                 A -0.13
                3       2                 B  0.02
                4       3                 A -0.08
                4       3                 B  0.09
                6       4                 A  0.05
                6       4                 B -0.03

Please note that the selection of MeasurementPoint = 2 for the first subject and MeasurementPoint = 6 for the last subject should happen randomly.


Solution

We can group_by the subject column, and filter rows that match the random MeasurementPoint value generated by sample.

library(dplyr)

df1 %>% 
  group_by(subject) %>% 
  filter(MeasurementPoint == sample(MeasurementPoint, 1))

# A tibble: 8 × 4
# Groups:   subject [4]
  MeasurementPoint subject MeasurementMethod value
             <dbl>   <dbl> <chr>             <dbl>
1                1       1 A                 -0.06
2                1       1 B                 -0.11
3                3       2 A                 -0.13
4                3       2 B                  0.02
5                4       3 A                 -0.08
6                4       3 B                  0.09
7                6       4 A                  0.05
8                6       4 B                 -0.03


Answered By - benson23
Answer Checked By - Robin (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing