Monday, August 29, 2022

[FIXED] How to split a dataframe into 2 by duplicated condition in R

Issue

If I have a dataframe named df like so..

 ____________________
| id   |  name | age |
|____________________|
| 0123 | Joe   | 20  |            
|____________________|
| 0123 | Kyle  | 45  |              
|____________________|
| 0333 | Susan | 24  |            
|____________________|
| 0333 | Molly | 80  |              
|____________________|

How can I split this df into two so that neither df has any duplicate id values. Hence, I am looking for them to be like so...

 ____________________
| id   |  name | age |
|____________________|
| 0123 | Joe   | 20  |            
|____________________|
| 0333 | Susan | 24  |              
|____________________|

 ____________________
| id   |  name | age |
|____________________|
| 0333 | Molly | 80  |            
|____________________|
| 0123 | Kyle  | 45  |              
|____________________|

Let me know if you can help!


Solution

Here is a dplyr solution:

df1 <- df %>% 
  distinct(id, .keep_all = TRUE)

df2 <- anti_join(df, df1)
> df1
   id  name age
1 123   Joe  20
2 333 Susan  24
> df2
   id  name age
1 123  Kyle  45
2 333 Molly  80


Answered By - TarJae
Answer Checked By - Robin (PHPFixing Admin)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.