Issue

I have a data frame with many Y and X variables. I would like to fit multiple single linear models with lm() by iterating through all of the X and Y variables. I'm working my way to including the other Y variables, but I'm struggling just iterating through the X variables.

My data looks something like this:

set.seed(200)
df <- data.frame(y1 = c(rnorm(n=20, mean = 5)),
                 y2 = c(rnorm(n=20, mean = 5)),
                 x1 = c(rnorm(n=20, mean = 13)), 
                 x2 = c(rnorm(n=20, mean = 14)), 
                 x3 = c(rnorm(n=20, mean = 15)))

I have tried multiple ways of fitting these models, but the best way seems to be using a for loop.

models <- list() #creating an empty list
for (i in names(df)[3:5]){ #choosing just the x-variables from the df
      
    models[[i]]   <- lm(y1 ~ get(i), df)
}

My outputs are in the models list, and I can access the statistics I want through summary(models[[1]] but I don't want to have to do this for each model that was fit. Is there a way to extract the statistics I want using do.call or map_df or something? Specifically I want the r.squared, residual standard error, p-value, and f.statistic.

Solution

This example is based on Chapter 25 of Wickham & Grolemund's "R for Data Science. Give it a read for the explanation.

library(dplyr)
library(modelr)
library(tidyverse)

set.seed(200)
df <- data.frame(y1 = c(rnorm(n=20, mean = 5)),
                 y2 = c(rnorm(n=20, mean = 5)),
                 x1 = c(rnorm(n=20, mean = 13)), 
                 x2 = c(rnorm(n=20, mean = 14)), 
                 x3 = c(rnorm(n=20, mean = 15)))

#Set up your data so that you nest each set of variables as dataframe within a dataframe
dfy <- df %>% select(starts_with("y"))
dfx <- df %>% select(starts_with("x"))

dat_all <- data.frame()

for (y in names(dfy)){
    for(x in names(dfx)){
        r <- paste(x,"_",y)
        data = (data.frame(x = dfx[x], y = dfy[y]))
        names(data) <- c("x", "y")
        dd <- data.frame(vars = r, data = data) %>%
                group_by(vars) %>%
                nest()
        dat_all <- rbind(dat_all, dd)
    }
}

myModel <- function(df) {
    lm(data.x ~ data.y, data = df)
}


dat_all <- dat_all %>%
    mutate(model = map(data, myModel))


glance <- dat_all %>% 
    mutate(glance = map(model, broom::glance)) %>% 
    unnest(glance, .drop = TRUE)



glance %>%
    select(r.squared, p.value)


#vars    r.squared p.value
#<chr>       <dbl>   <dbl>
#1 x1 _ y1 0.00946     0.683
#2 x2 _ y1 0.00474     0.773
#3 x3 _ y1 0.00442     0.781
#4 x1 _ y2 0.106       0.162
#5 x2 _ y2 0.0890      0.201
#6 x3 _ y2 0.0000162   0.987

Answered By - stomper

Answer Checked By - Katrina (PHPFixing Volunteer)

Monday, August 15, 2022

[FIXED] How to fit multiple models and extract model outputs from a nested list into a df

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Monday, August 15, 2022

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To