PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, October 7, 2022

[FIXED] How to find quantiles of an empirical cumulative density function (ECDF)

 October 07, 2022     distribution, r, statistics     No comments   

Issue

I am using ecdf() function to calculate empirical cumulative density function (ECDF) from some random samples:

set.seed(0)
X = rnorm(100)
P = ecdf(X)

Now P gives ECDF and we could plot it:

plot(P)
abline(h = 0.6, lty = 3)

ecdf

My question is: how can I find the sample value x, such that P(x) = 0.6, i.e., the 0.6-quantile of ECDF, or the x-coordinate of the intersection point of ECDF and h = 0.6?


Solution

In the following, I will not use ecdf(), as it is easy to obtain empirical cumulative density function (ECDF) ourselves.

First, we sort samples X in ascending order:

X <- sort(X)

ECDF at those samples takes function values:

e_cdf <- 1:length(X) / length(X)

We could then sketch ECDF by:

plot(X, e_cdf, type = "s")
abline(h = 0.6, lty = 3)

enter image description here

Now, we are looking for the first value of X, such that P(X) >= 0.6. This is just:

X[which(e_cdf >= 0.6)[1]]
# [1] 0.2290196

Since our data are sampled from standard normal distribution, the theoretical quantile is

qnorm(0.6)
# [1] 0.2533471

So our result is quite close.


Extension

Since the inverse of CDF is quantile function (for example, the inverse of pnorm() is qnorm()), one may guess the inverse of ECDF as sample quantile, i,e, the inverse ecdf() is quantile(). This is not true!

ECDF is a staircase / step function, and it does not have inverse. If we rotate ECDF around y = x, the resulting curve is not a mathematical function. So sample quantile is has nothing to do with ECDF.

For n sorted samples, sample quantile function is in fact a linear interpolation function of (x, y), with:

  • x-values being seq(0, 1, length = n);
  • y-values being sorted samples.

We can define our own version of sample quantile function by:

my_quantile <- function(x, prob) {
  if (is.unsorted(x)) x <- sort(x)
  n <- length(x)
  approx(seq(0, 1, length = n), x, prob)$y
  }

Let's have a test:

my_quantile(X, 0.6)
# [1] 0.2343171

quantile(X, prob = 0.6, names = FALSE)
# [1] 0.2343171

Note that result is different from what we get from X[which(e_cdf >= 0.6)[1]].

It is for this reason, that I refuse to use quantile() in my answer.



Answered By - Zheyuan Li
Answer Checked By - Clifford M. (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing