PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, October 7, 2022

[FIXED] How do I pass arguments to srvyr inside of a function?

 October 07, 2022     dataframe, r, statistics, survey, tibble     No comments   

Issue

so I'm using srvyr to calculate survey means of a variable (y) from a survey object, grouping by a categorical variable (x) from that same survey object, and the basic code looks like this

survey_means <- survey_object %>%
 filter( #remove NAs) %>%
 group_by(x) %>%
 summarise(Mean = survey_mean(y)) 

Suppose I want to instead put this block of code inside a function, which accepts the survey object and two variables as parameters. This is a simplified version of what I'm actually trying to do, which is a function that will handle up to a group of 4 or so variables, but this is the base case:

SurveyMeanFunc <- function(survey_object, x, y) {

survey_means <- survey_object %>%
 filter( #remove NAs ) %>%
 group_by(survey_object[["variables"]][[x]]) %>%
 summarise(Mean = survey_mean(survey_object[["variables"]][[y]]))
 
return(survey_means) 

}

When attempting to use this function I will always be presented with an error message along the lines of

! Assigned data `x` must be compatible with existing data.
x Existing data has n rows.
x Assigned data has m rows. (m > n)
i Only vectors of size 1 are recycled.

Even when I split up the pipes, and verify that the number of rows in x are the same as y right before using the summarise command, I still get this message. What is summarise() doing that I don't understand?

[EDIT] Full Context with suggested changes:

SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1= NULL, categ2= NULL) {
  
  if (is.null(categ1) & is.null(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
              
  } else if (is.null(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  } else {
    
    NULL #fix
    
  }
  
  return(survey_estimate)
  
}

The remaining issue is that using quasiquotation to solve the issue of referencing the survey variables works for the top level of this if-else statement but the function parameters are not recognised inside the next else if block, even though they are treated the same way using {{}}


Solution

You don't give an example of how you want to use the function, but if I'm understanding correctly, you want to take your first block of code and run it with x replaced by the name of the variable passed in as the x argument and y by the name of the variable passed in as the y argument (only with the 'remove NAs' line deleted or fixed to do something)

That is, you want SurveyMeanFunc(my_design, species, height) to be

my_design %>%
 group_by(species) %>%
 summarise(Mean = survey_mean(height)) 

This is complicated because you don't want the value of x or the name x, you want the name species.

One way is quasiquotation, which used to require enquo and !! but now can be done more easily with the {{ }} operator

SurveyMeanFunc <- function(survey_object, x, y) {
survey_means <- survey_object %>%
 group_by({{ x }}) %>%
 summarise(Mean = survey_mean({{ y }}))
 survey_means
}

giving

> dstrata <- apistrat %>%
+   as_survey(strata = stype, weights = pw)
> 
> SurveyMeanFunc(dstrata, stype, api00)
# A tibble: 3 × 3
  stype  Mean Mean_se
  <fct> <dbl>   <dbl>
1 E      674.    12.5
2 H      626.    15.5
3 M      637.    16.6

Update

You still don't give an example of how you want to use the function, but I think this works

SurveyMeanMedFunc <- function(survey_obj, xvar, yvar, categ1, categ2) {
  
  if (missing(categ1) & missing(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
              
  } else if (missing(categ2)) {
    
    survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ xvar }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  } else {
    
   survey_estimate <- survey_obj %>%
      filter(!is.na({{ xvar }}), !is.na({{ yvar }})) %>%
      group_by({{ categ2 }}, {{ categ1 }}) %>%
      summarise(Mean = survey_mean({{ yvar }}, vartype = "ci"))
    
  }
  
  return(survey_estimate)
  
}

The issue is that you can't evaluate categ1 or categ2 in the if condition if they are supplied by the user, because you're not evaluating them in a survey object. R doesn't know where to look. This is a problem because of the way the tidyverse uses unquoted variable names -- if you supplied them as model formulas (as you would in survey) or as quoted strings you'd be ok.

The missing function asks whether an argument was supplied, which in this case is what you want. There's a more flexible is_missing/maybe_missing setup in the rlang package; you could look at that for another option. But this seems to work

> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide,comp.imp)
# A tibble: 4 × 5
# Groups:   comp.imp [2]
  comp.imp sch.wide  Mean Mean_low Mean_upp
  <fct>    <fct>    <dbl>    <dbl>    <dbl>
1 No       No       1013.     810.    1216.
2 No       Yes       525.     438.     611.
3 Yes      No        370.     207.     533.
4 Yes      Yes       521.     475.     566.
> SurveyMeanMedFunc(dstrata,stype,enroll,sch.wide)
# A tibble: 6 × 5
# Groups:   stype [3]
  stype sch.wide  Mean Mean_low Mean_upp
  <fct> <fct>    <dbl>    <dbl>    <dbl>
1 E     No        420.     340.     499.
2 E     Yes       417.     381.     452.
3 H     No       1520.    1209.    1830.
4 H     Yes      1137.     946.    1328.
5 M     No        967.     709.    1226.
6 M     Yes       775.     669.     881.
> SurveyMeanMedFunc(dstrata,stype,enroll)
# A tibble: 3 × 4
  stype  Mean Mean_low Mean_upp
  <fct> <dbl>    <dbl>    <dbl>
1 E      417.     384.     450.
2 H     1321.    1134.    1508.
3 M      832.     722.     943.


Answered By - Thomas Lumley
Answer Checked By - Marie Seifert (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing