PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Saturday, October 29, 2022

[FIXED] How to left_join() two datasets but only select specific columns from one of the datasets?

 October 29, 2022     dplyr, left-join, r, select, tidyverse     No comments   

Issue

Here are two datasets: (this is fake data)

library(tidyverse)
myfruit <- tibble(fruit_name = c("apple", "pear", "banana", "cherry"),
                  number = c(2, 4, 6, 8))

fruit_info <- tibble(fruit_name = c("apple", "pear", "banana", "cherry"),
                     colour = c("red", "green", "yellow", "d.red"),
                     batch_number = c(4, 4, 4, 4),
                     type = c("gala", "conference", "cavendish", "bing"),
                     weight = c(10, 11, 12, 13),
                     age_days = c(20, 22, 24, 16))

> myfruit
# A tibble: 4 x 2
  fruit_name number
  <chr>       <dbl>
1 apple           2
2 pear            4
3 banana          6
4 cherry          8

> fruit_info
# A tibble: 4 x 6
  fruit_name colour batch_number type       weight age_days
  <chr>      <chr>         <dbl> <chr>       <dbl>    <dbl>
1 apple      red               4 gala           10       20
2 pear       green             4 conference     11       22
3 banana     yellow            4 cavendish      12       24
4 cherry     d.red             4 bing           13       16

I want to use the dplyr::left_join() function to combine myfruit and fruit_info together, but I only want "batch_number" and "type" columns from fruit_info.

I know I can do this:

new_myfruit <- left_join(myfruit, fruit_info, by = "fruit_name") %>%
  select(fruit_name, number, batch_number, type)

# OR

new_myfruit <- left_join(myfruit, fruit_info, by = "fruit_name") %>%
  select(-colour, -weight, -age_days)

But if I had many more columns in fruit_info and I had to type in many column names into the select() function it would be very time-consuming. So, is there a more efficient way to do this?

Edit:

I've seen examples online where you can do something like this:

new_myfruits  <- left_join(myfruit,
                           fruit_info %>% select(batch_number, type),
                           by = "fruit_name")

But I get an error which says:

# Error: Join columns must be present in data.
#  x Problem with `fruit_name`.

Anyone know what I'm doing wrong?

I would appreciate any help :)


Solution

Try this. You can combine select() with contains() and in the last function add the tags you want to extract, so there is no need of setting each name individually or by column number. Here the code:

library(dplyr)
#Code
newdf <- left_join(myfruit, fruit_info, by = "fruit_name") %>%
  select(contains(c('fruit','number','type')))

Output:

# A tibble: 4 x 4
  fruit_name number batch_number type      
  <chr>       <dbl>        <dbl> <chr>     
1 apple           2            4 gala      
2 pear            4            4 conference
3 banana          6            4 cavendish 
4 cherry          8            4 bing      


Answered By - Duck
Answer Checked By - David Goodson (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing