PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, October 7, 2022

[FIXED] How to calculate all aggregations at once without using a loop over indices?

 October 07, 2022     pandas, rolling-computation, statistics     No comments   

Issue

How to calculate all aggregations at once without using a loop over indices?

%%time
import random
random.seed(1)
df = pd.DataFrame({'val':random.sample(range(10), 10)})

for j in range(10):
    for i in df.index:
        df.loc[i,'mean_last_{}'.format(j)] = df.loc[(df.index < i) & (df.index >= i - j),'val'].mean()
        df.loc[i,'std_last_{}'.format(j)] = df.loc[(df.index < i) & (df.index >= i - j),'val'].std()
        df.loc[i,'max_last_{}'.format(j)] = df.loc[(df.index < i) & (df.index >= i - j),'val'].max()
        df.loc[i,'min_last_{}'.format(j)] = df.loc[(df.index < i) & (df.index >= i - j),'val'].min()
        df.loc[i,'median_last_{}'.format(j)] = df.loc[(df.index < i) & (df.index >= i - j),'val'].median()

Solution

I think what you're looking for is something like this:

import random
random.seed(1)
df = pd.DataFrame({'val':random.sample(range(10), 10)})

for j in range(1, 10):
    df[f'mean_last_{j}'] = df['val'].rolling(j, min_periods=1).mean()
    df[f'std_last_{j}'] = df['val'].rolling(j, min_periods=1).std()
    df[f'max_last_{j}'] = df['val'].rolling(j, min_periods=1).max()
    df[f'min_last_{j}'] = df['val'].rolling(j, min_periods=1).min()
    df[f'median_last_{j}'] = df['val'].rolling(j, min_periods=1).median()

However, my code is "off-by-one" relative to your example code. Do you intend for each aggregation INCLUDE value from the current row, or should it only use the previous j rows, without the current one? My code includes the current row, but yours does not. Your code results in NaN values for the first group of aggregations.

Edit: The answer from @Carlos uses rolling(j).aggregate() to specify list of aggregations in one line. Here's what that looks like:

import random
random.seed(1)
df = pd.DataFrame({'val':random.sample(range(10), 10)})

aggs = ['mean', 'std', 'max', 'min', 'median']

for j in range(10):
    stats = df["val"].rolling(j, min_periods=min(j, 1)).aggregate(aggs)
    df[[f"{a}_last_{j}" for a in aggs]] = stats.values


Answered By - Stuart Berg
Answer Checked By - Cary Denson (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing