PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Tuesday, May 17, 2022

[FIXED] How to subset multiple columns from df including grep match

 May 17, 2022     dataframe, match, partial, r, subset     No comments   

Issue

I have a very large data set that includes multiple columns with common portions of their names (e.g ctq_1, ctq_2, ctq_3 and also panas_1, panas_2, panas_3). I'd like to subset some of those columns (e.g. only those containing 'panas' in the column name) alongside certain other columns from the same data frame that have unique names (e.g. id, group).

I tried using a grep function inside square brackets, which worked nicely: panas <- bigdata[ , grep('panas', colnames(bigdata))] but now I need to work out how to also include the other two columns that I need, which are id and group. I tried: panas <- bigdata[ , c('id', 'group', grep('panas', colnames(bigdata)))] but I get this error: Error: Can't find columns 114, 115, 116, 117, 118, … (and 15 more) in .data. Call rlang::last_error() to see a backtrace.

How can I achieve what I want to with the simplest code possible? I am an R newbie so avoiding fancy functions would be ideal!

Here is a reproducible example.


> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> newframe <- iris[ , grep('Petal', colnames(iris))] # This works

> newframe <- iris[ , c('Species', grep('Petal', colnames(iris)))] # This doesn't work

This time, the error is:

Error in [.data.frame(iris, , c("Species", grep("Petal", colnames(iris)))) : undefined columns selected


Solution

Assuming I understood what you would like to do, a possible solution that may not be useful and/or may be redundant:

my_selector <- function(df,partial_name,...){
  positional_names <- match(...,names(df))
  df[,c(positional_names,grep(partial_name,names(df)))]
}
my_selector(iris, partial_name = "Petal","Species")

A "simpler" option would be to use grep and the like to match the target names at once:

iris[grep("Spec.*|Peta.*", names(iris))]

Or even simpler, as suggested by @akrun , we can simply do:

iris[grep("(Spec|Peta).*", names(iris))]

For more columns, we could do something like:

my_selector(iris, partial_name = "Petal",c("Species","Sepal.Length"))
       Species Sepal.Length Petal.Length Petal.Width
1       setosa          5.1          1.4         0.2
2       setosa          4.9          1.4         0.2

Note however that in the above function, the columns are selected counter-intuitively in that the names supplied last are selected first.

Result for the first part(truncated):

         Species Petal.Length Petal.Width
1       setosa          1.4         0.2
2       setosa          1.4         0.2
3       setosa          1.3         0.2
4       setosa          1.5         0.2
5       setosa          1.4         0.2
6       setosa          1.7         0.4
7       setosa          1.4         0.3


Answered By - NelsonGon
Answer Checked By - Senaida (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing