PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Tuesday, November 22, 2022

[FIXED] How to extract minimum and maximum values based on conditions in R

 November 22, 2022     loops, max, minimum, multiple-conditions, r     No comments   

Issue

I have a data frame with thousands of rows and I need to output the minimum and maximum values of sections of data that belong to the same group and class. What I need is to read the first start value, compare it to the previous value in the end column and if smaller, jump to the next row and so on until the starting value is larger than the previous end value, then output the minimum starting value and the maximun for that section. My data is already ordered by group-class-start-end.

df <- data.frame(group = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
  class = c("2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3"),
  start = c("23477018","23535465","23567386","24708741","24708741","24708741","48339885","87274","87274","127819","1832772","1832772","1832772","6733569","7005524","7005524","7644572","8095433","8095433","8095433"),
  end = c("47341413", "47341413", "47909872","42247834","47776347","47909872","53818713","3161655","3479466","3503792","3503792","4916249","5329014","8089225","12037894","13934484","12037894","12037894","13626119","13934484"))

The output that I want to achieve is:

  group     class   start     end     
1   1       2    23477018   47909872
2   1       2    48339885   53818713
3   1       3    87274      5329014
4   1       3    6733569    13934484

Any ideas on how to achieve this will be very much appreciated.


Solution

I used data.table for this.
My approach was to first change start and end to integers or there will be ordering problems.
Find which rows meet the start > max(all prior ends), then use cumsum to give an increasing sub-group number.
Then it's just a simple min and max by sub-group.
There are no loops to make this as fast as possible.

library(data.table)
df <- data.frame(group = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
                 class = c("2", "2", "2", "2", "2", "2", "2", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3", "3"),
                 start = c("23477018","23535465","23567386","24708741","24708741","24708741","48339885","87274","87274","127819","1832772","1832772","1832772","6733569","7005524","7005524","7644572","8095433","8095433","8095433"),
                 end = c("47341413", "47341413", "47909872","42247834","47776347","47909872","53818713","3161655","3479466","3503792","3503792","4916249","5329014","8089225","12037894","13934484","12037894","12037894","13626119","13934484"))

setDT(df)
df[, c('start', 'end') := lapply(.SD, as.integer), .SDcols = c('start', 'end')]
df[, subgrp := cumsum(start > shift(cummax(.SD$end), fill = 0)), keyby = c('group', 'class')]
ans <- df[, .(start = min(start), end = max(end)), keyby = c('group', 'class', 'subgrp')]
ans[, subgrp := NULL][]

   group class    start      end
1:     1     2 23477018 47909872
2:     1     2 48339885 53818713
3:     1     3    87274  5329014
4:     1     3  6733569 13934484


Answered By - Brian Montgomery
Answer Checked By - Cary Denson (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing