PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, October 7, 2022

[FIXED] how to interpret z-score of a column to find the distribution type?

 October 07, 2022     normal-distribution, pandas, python, statistics     No comments   

Issue

I have a pandas dataframe with couple of columns.

I calculated z-score based on mean and standard deviation for one of the column.

Now, i would like to know what distribution based on z-score? Based on histogram i can tell its normal distribution.

Is there an programmatic to tell distribution type based on z-score?

I'm new to statistics. so maybe i'm missing something very simple.

Sample code:

df[col_zscore] = (df[column] - df[column].mean())/df[column].std(ddof=0)

Solution

If distribution is normal distribution, from 68–95–99.7 rule, 68% of the df[col_zscore] will be between -1 to 1 , 95% between -2 to 2, and 99.7% between -3 to 3. On the other hand extreme, the z score is infinity for a fixed number.

You can check if it is close to normal or a fixed value by the following function:

import math
def three_sigma_rule(input):
  input = input.tolist()
  one_sigma = (len([ele for ele in input if -1<ele<1])) / len(input) * 100
  two_sigma = (len([ele for ele in input if -2<ele<2])) / len(input) * 100
  three_sigma = (len([ele for ele in input if -3<ele<3])) / len(input) * 100
  print("Percentage of the z-score between -1 to 1: {0}%".format(one_sigma))
  print("Percentage of the z-score between -2 to 2: {0}%".format(two_sigma))
  print("Percentage of the z-score between -3 to 3: {0}%".format(three_sigma))
  condition1 = math.isclose(one_sigma,68,rel_tol=0.1)
  condition2 = math.isclose(two_sigma,95,rel_tol=0.1)
  condition3 = math.isclose(three_sigma,99.7,rel_tol=0.1)
  condition4 = np.isnan(input).all()
  if condition1 and  condition2 and condition3:
    print("It is normal distribution.")      
  if condition4:
    print("It is fixed value.") 

Let's generate some random numbers:

if __name__ == "__main__":
  import pandas as pd
  import numpy as np

  n = 100000
  df = pd.DataFrame(dict(
    a=np.random.normal(5,3,size=n),
    b=np.random.uniform(low=-100, high=10000, size=n),
    c=np.random.uniform(low=5, high=5, size=n),
  ))
  df['a_zscore'] = (df['a'] - df['a'].mean())/df['a'].std(ddof=0)
  df['b_zscore'] = (df['b'] - df['b'].mean())/df['b'].std(ddof=0)
  df['c_zscore'] = (df['c'] - df['c'].mean())/df['c'].std(ddof=0)

Output of three_sigma_rule(df['a_zscore']):

enter image description here



Answered By - Ka-Wa Yip
Answer Checked By - David Marino (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing