Issue

I am using ecdf() function to calculate empirical cumulative density function (ECDF) from some random samples:

set.seed(0)
X = rnorm(100)
P = ecdf(X)

Now P gives ECDF and we could plot it:

plot(P)
abline(h = 0.6, lty = 3)

My question is: how can I find the sample value x, such that P(x) = 0.6, i.e., the 0.6-quantile of ECDF, or the x-coordinate of the intersection point of ECDF and h = 0.6?

Solution

In the following, I will not use ecdf(), as it is easy to obtain empirical cumulative density function (ECDF) ourselves.

First, we sort samples X in ascending order:

X <- sort(X)

ECDF at those samples takes function values:

e_cdf <- 1:length(X) / length(X)

We could then sketch ECDF by:

plot(X, e_cdf, type = "s")
abline(h = 0.6, lty = 3)

Now, we are looking for the first value of X, such that P(X) >= 0.6. This is just:

X[which(e_cdf >= 0.6)[1]]
# [1] 0.2290196

Since our data are sampled from standard normal distribution, the theoretical quantile is

qnorm(0.6)
# [1] 0.2533471

So our result is quite close.

Extension

Since the inverse of CDF is quantile function (for example, the inverse of pnorm() is qnorm()), one may guess the inverse of ECDF as sample quantile, i,e, the inverse ecdf() is quantile(). This is not true!

ECDF is a staircase / step function, and it does not have inverse. If we rotate ECDF around y = x, the resulting curve is not a mathematical function. So sample quantile is has nothing to do with ECDF.

For n sorted samples, sample quantile function is in fact a linear interpolation function of (x, y), with:

x-values being seq(0, 1, length = n);
y-values being sorted samples.

We can define our own version of sample quantile function by:

my_quantile <- function(x, prob) {
  if (is.unsorted(x)) x <- sort(x)
  n <- length(x)
  approx(seq(0, 1, length = n), x, prob)$y
  }

Let's have a test:

my_quantile(X, 0.6)
# [1] 0.2343171

quantile(X, prob = 0.6, names = FALSE)
# [1] 0.2343171

Note that result is different from what we get from X[which(e_cdf >= 0.6)[1]].

It is for this reason, that I refuse to use quantile() in my answer.

Answered By - Zheyuan Li

Answer Checked By - Clifford M. (PHPFixing Volunteer)

Friday, October 7, 2022

[FIXED] How to find quantiles of an empirical cumulative density function (ECDF)

Issue

Solution

Extension

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Friday, October 7, 2022

Issue

Solution

Extension

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To