Issue
I have a data.table with entry and exit dates per individual and a text column indicating the reason for the exit. My data looks like this:
dt <- data.table (ID = c(1,2,3,4,5),
entry = c("01/01/2010", "01/02/2016", "01/05/2010", "01/09/2013", "01/01/2010"),
exit = c("31/12/2010", "01/01/2021", "30/09/2010", "31/12/2015", "30/09/2010"),
text = c("a", NA, "c", NA, "b"),
result_2010 = c(NA, NA, NA, NA,NA))
ID entry exit text result_2010
1: 1 01/01/2010 31/12/2010 a NA
2: 2 01/02/2016 01/01/2021 <NA> NA
3: 3 01/05/2010 30/09/2010 c NA
4: 4 01/09/2013 31/12/2015 <NA> NA
5: 5 01/01/2010 30/09/2010 b NA
In the column "result_2010" I would like to decide if the individual has left the company between 01.01.2010 and 31.12.2010, but only if in column "text" the individual has either "a" or "c". Otherwise the result should return "false".
The result should look something like this:
ID entry exit text result_2010
1: 1 01/01/2010 31/12/2010 a TRUE
2: 2 01/02/2016 01/01/2021 <NA> FALSE
3: 3 01/05/2010 30/09/2010 c TRUE
4: 4 01/09/2013 31/12/2015 <NA> FALSE
5: 5 01/01/2010 30/09/2010 b FALSE
Does anyone has an idea how I could do this?
Solution
We can convert the columns to Date
class, and create a logical column based on the conditions in the OP's post
library(dplyr)
library(lubridate)
dt %>%
mutate(across(c(entry, exit), dmy)) %>%
mutate(result_2010 = entry >= as.Date('2010-01-01') &
exit <= as.Date("2010-12-31") & text %in% c("a", "c"))
-output
ID entry exit text result_2010
1: 1 2010-01-01 2010-12-31 a TRUE
2: 2 2016-02-01 2021-01-01 <NA> FALSE
3: 3 2010-05-01 2010-09-30 c TRUE
4: 4 2013-09-01 2015-12-31 <NA> FALSE
5: 5 2010-01-01 2010-09-30 b FALSE
Answered By - akrun Answer Checked By - David Goodson (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.