Issue
I have session id's, client id's, a conversion column and all with a specific date. I want to delete the rows after the last purchase of a client. My data looks as follows:
SessionId ClientId Conversion Date
1 1 0 05-01
2 1 0 06-01
3 1 0 07-01
4 1 1 08-01
5 1 0 09-01
6 2 0 05-01
7 2 1 06-01
8 2 0 07-01
9 2 1 08-01
10 2 0 09-01
As output I want:
SessionId ClientId Conversion Date
1 1 0 05-01
2 1 0 06-01
3 1 1 07-01
6 2 0 05-01
7 2 1 06-01
8 2 0 07-01
9 2 1 08-01
I looks quite easy, but it has some conditions. Based on the client id, the sessions after the last purchase of a cutomer need to be deleted. I have many observations, so deleting after a particular date is not possible. It need to check every client id on when someone did a purchase.
I have no clue what kind of function I need to use for this. Maybe a certain kind of loop?
Hopefully someone can help me with this.
Solution
If your data is already ordered according to Date
, for each ClientId
we can select all the rows before the last conversion took place.
This can be done in base R :
subset(df, ave(Conversion == 1, ClientId, FUN = function(x) seq_along(x) <= max(which(x))))
Using dplyr
:
library(dplyr)
df %>% group_by(ClientId) %>% filter(row_number() <= max(which(Conversion == 1)))
Or data.table
:
library(data.table)
setDT(df)[, .SD[seq_len(.N) <= max(which(Conversion == 1))], ClientId]
Answered By - Ronak Shah Answer Checked By - Gilberto Lyons (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.