PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, October 7, 2022

[FIXED] How to calculate summary stats over multiple columns in a dataframe with date/time series?

 October 07, 2022     mean, r, standard-deviation, statistics     No comments   

Issue

I have a dataframe where the first column is a date/time series and the other 9 columns are sites with associated water temperature. I want to calculate the monthly summary stats for each column. The summary stats I want are, monthly mean with the +/- sd temp, monthly proportion of time > 20C & 23C, monthly maximum temperature. Here is a sample df of my data

# Create a, b, c, d variables
a <- c("06-25-20 08:00:00 AM","06-25-20 08:15:00 AM",
       "06-25-20 08:30:00 AM","06-25-20 08:45:00 AM",
       "07-25-20 08:45:00 AM", "07-25-20 08:45:00 AM",
       "08-25-20 08:45:00 AM", "08-25-20 08:45:00 AM",
       "09-25-20 08:45:00 AM","09-25-20 08:45:00 AM")
b <- c(4,5,8, "N/A", 4,5,"N/A",7,7,6)
c <- c(6,10,8, "N/A", 8,5,"N/A",8,7,2)
# Join the variables to create a data frame
df <- data.frame(a,b,c)
df$a = as.POSIXlt(df$a, format="%m-%d-%y%H:%M:%S", tz = 'EST')

I started just trying to get the monthly mean which works, but for the life of me I cannot get the table to also include Standard deviation, and all the other summary stats without making a bunch of extra code. Here is code I used for mean

Monthly_2020Temp = df %>% 
  group_by(a = format(as.Date(a), '%b-%Y')) %>%
  summarise_each(funs( if(length(na.omit(.))>=15)
    mean(., na.rm=TRUE) else NA_real_), 
    b:c)

Here is the code I tried to use when adding standard deviation

Monthly_2020Temp = df %>% 
  group_by(a = format(as.Date(a), '%b-%Y')) %>%
  summarise_each(funs( if(length(na.omit(.))>=15)
    mean(., na.rm=TRUE) else NA_real_), sd(., na.rm=TRUE) else NA_real_), 
    b:c)

but I get an error

Error: unexpected 'else' in:
"  summarise_each(funs( if(length(na.omit(.))>=15)
    mean(., na.rm=TRUE) else NA_real_), sd(., na.rm=TRUE) else"
>     b:c)
Error: unexpected ')' in "    b:c)"

Can someone help me make a nice table of summary stats for my time series data?


Solution

It's best to get the data into "long" format when using dplyr summarize. Here's one possible approach to get you started:

df$b <- as.numeric(df$b)
df$c <- as.numeric(df$c)

df %>% pivot_longer(-c(a)) %>% 
  mutate(month = lubridate::month(a), year = lubridate::year(a)) %>% 
  group_by(month, year) %>% 
  summarize(avg = mean(value, na.rm = TRUE), sd =    sd(value, na.rm = TRUE))


Answered By - JeffR
Answer Checked By - Gilberto Lyons (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing