Issue
I would like to replace a range of integer values with a string character based on conditions.
For example, I have a dataframe
Gender Grade Indus
1 1 610 15
2 1 110 29
3 2 210 32
4 1 250 20
5 2 420 37
6 2 430 19
7 1 450 25
I would like to replace the values in 'Grade' column with some string character based on conditions as follows:
prima =c(110,210:250,610)
secon =c(420,440:460)
vocat =c(430,470)
If the number in 'Grade' falls in prima, for example, if Grade==610, I would like to change the number to a word 'Primary'.
I have tried by using...
mydf$Grade[mydf$Grade == prima] <- "Primary"
mydf$Grade[mydf$Grade == secon] <- "Secondary"
mydf$Grade[mydf$Grade == vocat] <- "Vocational"
but it did not work. It didn't return error, but only a very very few values changed to 'Primary' or 'Secondary', leaving a bunch of other numbers unchanged.
I have also tried...
for (i in mydf$Grade) {
if (i %in% prima) mydf$Grade <- "Primary"
else if (i %in% secon) mydf$Grade <- "Secondary"
else if (I %in% vocat) mydf$Grade <- "Vocational"
}
which also did not work. All the values in 'Grade' turned to 'Primary' instead. These two methods I have tried with the real data where I also have to loop over 10 years.
I don't know what I did wrong. I have tried these method and it worked when I wanted to replace with NaN; however, it does not work when I wanted to replace with other integers or string characters. Any advices would be very much appreciated.
Solution
==
does element-wise comparison. Since we want to compare multiple elements here use %in%
mydf$Grade[mydf$Grade %in% prima] <- "Primary"
mydf$Grade[mydf$Grade %in% secon] <- "Secondary"
mydf$Grade[mydf$Grade %in% vocat] <- "Vocational"
Or use dplyr::case_when
library(dplyr)
mydf %>%
mutate(Grade = case_when(Grade %in% prima ~ "Primary",
Grade %in% secon ~ "Secondary",
Grade %in% vocat ~ "Vocational"))
# Gender Grade Indus
#1 1 Primary 15
#2 1 Primary 29
#3 2 Primary 32
#4 1 Primary 20
#5 2 Secondary 37
#6 2 Vocational 19
#7 1 Secondary 25
data
mydf <- structure(list(Gender = c(1L, 1L, 2L, 1L, 2L, 2L, 1L), Grade = c(610L,
110L, 210L, 250L, 420L, 430L, 450L), Indus = c(15L, 29L, 32L,
20L, 37L, 19L, 25L)), class = "data.frame", row.names = c(NA, -7L))
Answered By - Ronak Shah Answer Checked By - Dawn Plyler (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.