PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Tuesday, November 22, 2022

[FIXED] How to create a column based on multiple criteria in r?

 November 22, 2022     if-statement, multiple-conditions, r     No comments   

Issue

Currently I have a variable "Sex" that contains 1's and 2's for respectively men and women. I want to add random noise to this variable. Therefore I generated random numbers using a normal distribution. The next step is to determine if some of the values have to change to the other sex. I use a z-value of 2 and -2 as boundaries. So if a man (1) is assigned to a value >2, it has to change to a woman. It works also the other way around, so when a woman (2) is assigned to a random z-value of <-2, the sex variable has to change to man (1). In all the other options, the value has to remain the same value.

I thought a ifelse statement would do the trick. Unfortunately it did not work. My statement looks like:

with(Dataset18$New_sex,
     ifelse(Sex== 1 & Norm_dist_random > 2, 2 , ifelse(Sex== 1 & Norm_dist_random <= 2, 1, 
     ifelse(Sex== 2 & Norm_dist_random < -2, 1, ifelse(Sex== 2 & Norm_dist_random >= -2, 2))))
)

My data looks like:

Sex     Norm_dist_random
 1         0.622221897
 1         2.573726407
 1        -0.298095612
 1         0.717745305
 2        -2.597695772
 2         2.534427904
 2         0.089732903
 2        -0.329274570
 2        -1.173434147

In the end my data has to look like

Sex     Norm_dist_random   Sex_new
 1         0.622221897        1
 1         2.573726407        2
 1        -0.298095612        1
 1         0.717745305        1
 2        -2.597695772        1
 2         2.534427904        2
 2         0.089732903        2
 2        -0.329274570        2
 2        -1.173434147        2

Solution

One approach is with case_when which allows an arbitrary set of logical condition value pairs. Each argument is a left hand side that evaluates to TRUE or FALSE and a right hand side that defines the value. The two sides are separated by ~.

Conditions are tried in order until one is TRUE and that value is assigned. I added TRUE ~ NA_real_ to catch the rows that don't fulfill any conditions.

library(dplyr)
Dataset18 %>% 
  mutate(Sex_new = case_when(Sex == 1 & Norm_dist_random <= 2 ~ 1,
                             Sex == 1 & Norm_dist_random > 2 ~ 2,
                             Sex == 2 & Norm_dist_random < -2 ~ 1,
                             Sex == 2 & Norm_dist_random >= -2 ~ 2,
                             TRUE ~ NA_real_))
#  Sex Norm_dist_random Sex_new
#1   1        0.6222219       1
#2   1        2.5737264       2
#3   1       -0.2980956       1
#4   1        0.7177453       1
#5   2       -2.5976958       1
#6   2        2.5344279       2
#7   2        0.0897329       2
#8   2       -0.3292746       2
#9   2       -1.1734341       2


Answered By - Ian Campbell
Answer Checked By - David Marino (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing