Issue
I am working on the time-series data. To get features from data I have to calculate moving mean, median, mode, slop, kurtosis, skewness etc. I am familiar with scipy.stat
which provides an easy way to calculate these quantities for straight calculation. But for the moving/running part, I have explored the whole internet and got nothing.
Surprisingly moving mean, median and mode are very easy to calculate with numpy
. Unfortunately, there is no built-in function for calculating kurtosis and skewness.
If someone can help, how to calculate moving kurtosis and skewness with scipy? Many thanks
Solution
Pandas offers a DataFrame.rolling()
method which can be used, in combination with its Rolling.apply()
method (i.e. df.rolling().apply()
) to apply an arbitrary function to the specified rolling window.
If you are looking for NumPy-based solution, you could use FlyingCircus Numeric (disclaimer: I am the main author of it).
There, you could find the following:
flyingcircus_numeric.running_apply()
: can apply any function to a 1D array and supports weights, but it is slow;flyingcircus_numeric.moving_apply()
: can apply any function supporting aaxis: int
parameter to a 1D array and supports weights, and it is fast (but memory-hungry);flyingcircus_numeric.rolling_apply_nd()
: can apply any function supporting aaxis: int|Sequence[int]
parameter to any ND array and it is fast (and memory-efficient), but it does not support weights.
Based on your requirements, I would suggest to use rolling_apply_nd()
, e.g.:
import numpy as np
import scipy as sp
import flyingcircus_numeric as fcn
import scipy.stats
NUM = 30
arr = np.arange(NUM)
window = 4
new_arr = fcn.rolling_apply_nd(arr, window, func=sp.stats.kurtosis)
print(new_arr)
# [-1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36
# -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36 -1.36
# -1.36 -1.36 -1.36]
Of course, feel free to inspect the source code, it is open source (GPL).
EDIT
Just to get a feeling of the kind of speed we are talking about, these are the benchmarks for the solutions implemented in FlyingCircus:
The general approach flyingcircus_numeric.running_apply()
is a couple of orders of magnitude slower than either flyingcircus_numeric.rolling_apply_nd()
or flyingcircus_numeric.moving_apply()
, with the first being approx. one order of magnitude faster than the second.
This shows the speed price for generality or support for weighting.
The above plots were obtained using the scripts from here and the following code:
import scipy as sp
import flyingcircus_numeric as fcn
import scipy.stats
WINDOW = 4
FUNC = sp.stats.kurtosis
def my_rolling_apply_nd(arr, window=WINDOW, func=FUNC):
return fcn.rolling_apply_nd(arr, window, func=FUNC)
def my_moving_apply(arr, window=WINDOW, func=FUNC):
return fcn.moving_apply(arr, window, func)
def my_running_apply(arr, window=WINDOW, func=FUNC):
return fcn.running_apply(arr, window, func)
def equal_output(a, b):
return np.all(np.isclose(a, b))
input_sizes = (5, 10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000)
funcs = my_rolling_apply_nd, my_moving_apply, my_running_apply
runtimes, input_sizes, labels, results = benchmark(
funcs, gen_input=np.random.random, equal_output=equal_output,
input_sizes=input_sizes)
plot_benchmarks(runtimes, input_sizes, labels, units='s')
plot_benchmarks(runtimes, input_sizes, labels, units='ms', zoom_fastest=8)
(EDITED to reflect some refactoring of FlyingCircus)
Answered By - norok2 Answer Checked By - Katrina (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.