Issue
I have a very large data set that includes multiple columns with common portions of their names (e.g ctq_1, ctq_2, ctq_3 and also panas_1, panas_2, panas_3). I'd like to subset some of those columns (e.g. only those containing 'panas' in the column name) alongside certain other columns from the same data frame that have unique names (e.g. id, group).
I tried using a grep function inside square brackets, which worked nicely:
panas <- bigdata[ , grep('panas', colnames(bigdata))]
but now I need to work out how to also include the other two columns that I need, which are id and group. I tried:
panas <- bigdata[ , c('id', 'group', grep('panas', colnames(bigdata)))] but I get this error:
Error: Can't find columns 114
, 115
, 116
, 117
, 118
, … (and 15 more) in .data
.
Call rlang::last_error()
to see a backtrace.
How can I achieve what I want to with the simplest code possible? I am an R newbie so avoiding fancy functions would be ideal!
Here is a reproducible example.
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> newframe <- iris[ , grep('Petal', colnames(iris))] # This works
> newframe <- iris[ , c('Species', grep('Petal', colnames(iris)))] # This doesn't work
This time, the error is:
Error in
[.data.frame
(iris, , c("Species", grep("Petal", colnames(iris)))) : undefined columns selected
Solution
Assuming I understood what you would like to do, a possible solution that may not be useful and/or may be redundant:
my_selector <- function(df,partial_name,...){
positional_names <- match(...,names(df))
df[,c(positional_names,grep(partial_name,names(df)))]
}
my_selector(iris, partial_name = "Petal","Species")
A "simpler" option would be to use grep
and the like to match the target names at once:
iris[grep("Spec.*|Peta.*", names(iris))]
Or even simpler, as suggested by @akrun , we can simply do:
iris[grep("(Spec|Peta).*", names(iris))]
For more columns, we could do something like:
my_selector(iris, partial_name = "Petal",c("Species","Sepal.Length"))
Species Sepal.Length Petal.Length Petal.Width
1 setosa 5.1 1.4 0.2
2 setosa 4.9 1.4 0.2
Note however that in the above function, the columns are selected counter-intuitively in that the names supplied last are selected first.
Result for the first part(truncated):
Species Petal.Length Petal.Width
1 setosa 1.4 0.2
2 setosa 1.4 0.2
3 setosa 1.3 0.2
4 setosa 1.5 0.2
5 setosa 1.4 0.2
6 setosa 1.7 0.4
7 setosa 1.4 0.3
Answered By - NelsonGon Answer Checked By - Senaida (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.