PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Saturday, October 29, 2022

[FIXED] How to execute a left join in R?

 October 29, 2022     dplyr, left-join, r     No comments   

Issue

Below is the sample data and one manipulation. The first data set is employment specific to an industry. The second data set is overall employment and unemployment rate. I am seeking to do a left join (or at least that's what I think it should be) to achieve the desired result below. When I do it, I get a one to many issue with the row count growing. In this example, it goes from 14 to 18. In the larger data set, it goes from 228 to 4348. Primary question is if this can be done with only a properly written join script or is there more to it?

 area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
 periodyear<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
 month<-c(1,2,3,4,5,6,7,8,9,10,11,12,1,2)
 emp1 <-c(10,11,12,13,14,15,16,17,20,21,22,24,26,28)

 firstset<-data.frame(area1,periodyear,month,emp1)



 area1<-c(000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000,000000)
 periodyear1<-c(2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020,2021,2021)
 period<-c(01,02,03,04,05,06,07,08,09,10,11,12,01,02)
 rate<-c(3.0,3.2,3.4,3.8,2.5,4.5,6.5,9.1,10.6,5.5,7.8,6.5,4.5,2.9)
 emp2<-c(1001,1002,1005,1105,1254,1025,1078,1106,1099,1188,1254,1250,1301,1188)

 secondset<-data.frame(area2,periodyear1,period,rate,emp2)

 secondset <- secondset%>%mutate(month = as.numeric(period))

 secondset <- left_join(firstset,secondset, by=c("month"))

Desired Result (14 rows with below being the first 3)

 area1     periodyear   month     emp1    rate    emp2
000000         2020        1        10      3.0    1001
000000         2020        2        11      3.2    1002
000000         2020        3        12      3.4    1005

Solution

We may have to add 'periodyear' as well in the by

library(dplyr)
left_join(firstset,secondset, by=c("periodyear" = "periodyear1", 
      "area1" = "area2", "month"))

-output

   area1 periodyear month emp1 period rate emp2
1      0       2020     1   10      1  3.0 1001
2      0       2020     2   11      2  3.2 1002
3      0       2020     3   12      3  3.4 1005
...


Answered By - akrun
Answer Checked By - Robin (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing