PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, October 7, 2022

[FIXED] How to calculate the average of multiple standard deviations in R?

 October 07, 2022     dplyr, r, standard-deviation, statistics     No comments   

Issue

I am trying to figure out how to calculate the standard deviation of a dataset when I have a couple of standard deviations. Let's just look at this MWE:

set.seed(1234)
dummy_data <- data.frame(
  "col_1" = sample(1:7, size = 10, replace = TRUE),
  "col_2" = sample(1:7, size = 10, replace = TRUE),
  "col_3" = sample(1:7, size = 10, replace = TRUE),  
  "col_4" = sample(1:7, size = 10, replace = TRUE)
)

Now since I know all the data points I can calculate the total standard deviation as follows:

> sd(as.matrix(dummy_data))
[1] 1.727604

But the real data that I have at hand is the following:

> dplyr::summarise_all(dummy_data, sd)
     col_1    col_2   col_3    col_4
1 1.837873 1.873796 1.37032 1.888562

If I follow the usual method of calculating the average of multiple standard deviations with similar sample sizes, I would apply the following:

sds <- dplyr::summarise_all(dummy_data, sd)
vars <- sds^2
mean_sd <- sqrt(sum(vars) / (length(vars) - 1))

> mean_sd
[1] 2.027588

which is not the same! Now I have tried without the minus one:

> sqrt(sum(vars) / (length(vars)))
[1] 1.755942

which does not solve the problem. I have tried defining an own standard deviation function like this:

own_sd <- function(x) {
  sqrt(sum((x - mean(x))^2) / length(x))
}

to eliminate the x - 1 in the dplyr::summarise_all() step, and then average according to the step above:

> sqrt(sum(dplyr::summarise_all(dummy_data, own_sd)^2) / 3)
[1] 1.923538
> sqrt(sum(dplyr::summarise_all(dummy_data, own_sd)^2) / 4)
[1] 1.665833

But all seem to give a different result than the sd(as.matrix()) method. What is going wrong here?


Solution

You can't calculate a global SD from only knowing group SDs. For example:

x1 = 1:5
x2 = 11:15
x3 = 101:105

## all the SDs are equal
(sd1 = sd(x1))
#[1] 1.581139
(sd2 = sd(x2))
#[1] 1.581139
(sd3 = sd(x3))
#[1] 1.581139

## however, combining the groups in pairs give very different results
sd(c(x1, x2))
# [1] 5.477226

sd(c(x1, x3))
# [1] 52.72571

This demonstrates that even if the sample sizes are identical, knowing the standard deviation of two groups does not help you calculate the standard deviation of those groups combined.

As per Merijn van Tilborg's comment, if you know the group sizes and the group means, the calculation is possible as shown here.



Answered By - Gregor Thomas
Answer Checked By - Senaida (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing