Issue
I have session id's, client id's, a conversion column and all with a specific date. I want to delete the rows after the last purchase of a client. My data looks as follows:
SessionId ClientId Conversion Date
1 1 0 05-01
2 1 0 06-01
3 1 0 07-01
4 1 1 08-01
5 1 0 09-01
6 2 0 05-01
7 2 1 06-01
8 2 0 07-01
9 2 1 08-01
10 2 0 09-01
As output I want:
SessionId ClientId Conversion Date
1 1 0 05-01
2 1 0 06-01
3 1 1 07-01
6 2 0 05-01
7 2 1 06-01
8 2 0 07-01
9 2 1 08-01
I looks quite easy, but it has some conditions. Based on the client id, the sessions after the last purchase of a cutomer need to be deleted. I have many observations, so deleting after a particular date is not possible. It need to check every client id on when someone did a purchase.
I have no clue what kind of function I need to use for this. Maybe a certain kind of loop?
Hopefully someone can help me with this.
Solution
If your data is already ordered according to Date, for each ClientId we can select all the rows before the last conversion took place.
This can be done in base R :
subset(df, ave(Conversion == 1, ClientId, FUN = function(x) seq_along(x) <= max(which(x))))
Using dplyr :
library(dplyr)
df %>% group_by(ClientId) %>% filter(row_number() <= max(which(Conversion == 1)))
Or data.table :
library(data.table)
setDT(df)[, .SD[seq_len(.N) <= max(which(Conversion == 1))], ClientId]
Answered By - Ronak Shah Answer Checked By - Gilberto Lyons (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.