Issue
so I'm using srvyr to calculate survey means of a variable (y) from a survey object, grouping by a categorical variable (x) from that same survey object, and the basic code looks like this
survey_means <- survey_object %>%
filter( #remove NAs) %>%
group_by(x) %>%
summarise(Mean = survey_mean(y))
Suppose I want to instead put this block of code inside a function, which accepts the survey object and two variables as parameters. This is a simplified version of what I'm actually trying to do, which is a function that will handle up to a group of 4 or so variables, but this is the base case:
SurveyMeanFunc <- function(survey_object, x, y) {
survey_means <- survey_object %>%
filter( #remove NAs ) %>%
group_by(survey_object[["variables"]][[x]]) %>%
summarise(Mean = survey_mean(survey_object[["variables"]][[y]]))
return(survey_means)
}
When attempting to use this function I will always be presented with an error message along the lines of
! Assigned data `x` must be compatible with existing data.
x Existing data has n rows.
x Assigned data has m rows. (m > n)
i Only vectors of size 1 are recycled.
Even when I split up the pipes, and verify that the number of rows in x are the same as y right before using the summarise command, I still get this message. What is summarise() doing that I don't understand?
[EDIT] Full Context with suggested changes:
SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1= NULL, categ2= NULL) {
if (is.null(categ1) & is.null(categ2)) {
survey_estimate <- survey_obj %>%
filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
group_by({{ xvar }}) %>%
summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
} else if (is.null(categ2)) {
survey_estimate <- survey_obj %>%
filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
group_by({{ xvar }}, {{ categ1 }}) %>%
summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
} else {
NULL #fix
}
return(survey_estimate)
}
The remaining issue is that using quasiquotation to solve the issue of referencing the survey variables works for the top level of this if-else statement but the function parameters are not recognised inside the next else if block, even though they are treated the same way using {{}}
Solution
You don't give an example of how you want to use the function, but if I'm understanding correctly, you want to take your first block of code and run it with x
replaced by the name of the variable passed in as the x
argument and y
by the name of the variable passed in as the y
argument (only with the 'remove NAs' line deleted or fixed to do something)
That is, you want SurveyMeanFunc(my_design, species, height)
to be
my_design %>%
group_by(species) %>%
summarise(Mean = survey_mean(height))
This is complicated because you don't want the value of x
or the name x
, you want the name species
.
One way is quasiquotation, which used to require enquo
and !!
but now can be done more easily with the {{ }}
operator
SurveyMeanFunc <- function(survey_object, x, y) {
survey_means <- survey_object %>%
group_by({{ x }}) %>%
summarise(Mean = survey_mean({{ y }}))
survey_means
}
giving
> dstrata <- apistrat %>%
+ as_survey(strata = stype, weights = pw)
>
> SurveyMeanFunc(dstrata, stype, api00)
# A tibble: 3 × 3
stype Mean Mean_se
<fct> <dbl> <dbl>
1 E 674. 12.5
2 H 626. 15.5
3 M 637. 16.6
Update
You still don't give an example of how you want to use the function, but I think this works
SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1, categ2) {
if (missing(categ1) & missing(categ2)) {
survey_estimate <- survey_obj %>%
filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
group_by({{ xvar }}) %>%
summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
} else if (missing(categ2)) {
survey_estimate <- survey_obj %>%
filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
group_by({{ xvar }}, {{ categ1 }}) %>%
summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
} else {
survey_estimate <- survey_obj %>%
filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
group_by({{ categ2 }}, {{ categ1 }}) %>%
summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
}
return(survey_estimate)
}
The issue is that you can't evaluate categ1
or categ2
in the if
condition if they are supplied by the user, because you're not evaluating them in a survey object. R doesn't know where to look. This is a problem because of the way the tidyverse uses unquoted variable names -- if you supplied them as model formulas (as you would in survey
) or as quoted strings you'd be ok.
The missing
function asks whether an argument was supplied, which in this case is what you want. There's a more flexible is_missing
/maybe_missing
setup in the rlang
package; you could look at that for another option. But this seems to work
> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide,comp.imp)
# A tibble: 4 × 5
# Groups: comp.imp [2]
comp.imp sch.wide Mean Mean_low Mean_upp
<fct> <fct> <dbl> <dbl> <dbl>
1 No No 1013. 810. 1216.
2 No Yes 525. 438. 611.
3 Yes No 370. 207. 533.
4 Yes Yes 521. 475. 566.
> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide)
# A tibble: 6 × 5
# Groups: stype [3]
stype sch.wide Mean Mean_low Mean_upp
<fct> <fct> <dbl> <dbl> <dbl>
1 E No 420. 340. 499.
2 E Yes 417. 381. 452.
3 H No 1520. 1209. 1830.
4 H Yes 1137. 946. 1328.
5 M No 967. 709. 1226.
6 M Yes 775. 669. 881.
> SurveyMeanMedFunc(dstrata,stype,enroll)
# A tibble: 3 × 4
stype Mean Mean_low Mean_upp
<fct> <dbl> <dbl> <dbl>
1 E 417. 384. 450.
2 H 1321. 1134. 1508.
3 M 832. 722. 943.
Answered By - Thomas Lumley Answer Checked By - Marie Seifert (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.