PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0
Showing posts with label series. Show all posts
Showing posts with label series. Show all posts

Friday, October 28, 2022

[FIXED] how to replace empty series values with NaN in python

 October 28, 2022     dataframe, is-empty, pandas, python, series     No comments   

Issue

I am iterating over a number of columns and storing their summary statistics like mean, median, skewness and kurtosis in a dict as below:

metrics_dict['skewness'] = data_col.skew().values[0]
metrics_dict['kurtosis'] = data_col.kurt().values[0]
metrics_dict['mean'] = np.mean(data_col)[0]
metrics_dict['median'] = np.median(data_col)

However for some columns, it gives error as below:

IndexError: index out of bounds

The column in question is below:

Index          device
61021           C:2
61022          D:3+
61023          D:3+
61024           B:1
61025          D:3+
61026           C:2 

I simply want to append NA to the dict in case of such a column and not have the error interrupt my loop. Here index is just the index of the dataframe and the column under operation is device. Please note that the data has a large num of numeric columns ( ~ 500) where 2 -3 columns are like device and hence I need to just add NA to the dict for these and move on to the next column. Can someone please tell me how to do that in python?


Solution

Since these statistics are only meaningful for numeric columns, you can try isolating numeric columns. This is possible using pd.DataFrame.select_dtypes:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

numeric_cols = df.select_dtypes(include=numerics).columns

for col in df:
    if col in numeric_cols:
        # calculate & add some values to dictionary
    else:
        # add NA values to dictionary


Answered By - jpp
Answer Checked By - David Goodson (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Thursday, October 6, 2022

[FIXED] Why can't you calculate a summary statistic on a series with .apply?

 October 06, 2022     apply, pandas, python, series, statistics     No comments   

Issue

I'm experimenting with the .apply() method and noticed that Python doesn't seem to allow using it to calculate a summary statistic on a series.

You can use .apply() to modify all values in a series, e.g.,

test = pd.Series([1,2,3])

t_output = test.apply(lambda a : a *2)

t_output

yielding

enter image description here

However, if I wanted to generate a summary statistic, e.g.,

t_output2 = test.apply(np.sum)

t_output2

I just get the original series values: enter image description here

I know I could instead write test.sum() and get the desired summary statistic. But I'm interested in better understanding the technical reason why .apply() doesn't seem to permit generating one for an isolated series.

I know that a series is size-immutable, but that shouldn't be consequential here because I'm reassigning the .apply() output to a new object. So any details on better understanding this would be appreciated!


Solution

A dataframe is two-dimensional, thus you can apply some functionality along either dimension (i.e. on either rows or columns). As a series is one-dimensional, the functionality is applied on each element (i.e. value) of the series. Of course, if the entries of the series themselves have dimensions, then summation could be a useful function to apply, but if they are just numbers, applying summation is just the identity, as there is no dimension to summarize (i.e. sum) over.



Answered By - Michael Hodel
Answer Checked By - Mildred Charles (PHPFixing Admin)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg

Saturday, April 23, 2022

[FIXED] How to print percentages outside the pie with python

 April 23, 2022     pie-chart, python-3.x, series     No comments   

Issue

I would like to have percentage values outside the pie. Maybe you can help

Here is my code :

import matplotlib.pyplot as plt
import pandas as pd

dict={'a':45, 'b': 123, 'c':2, 'd':1755, 'e':13}
ser = pd.Series(dict)

print(ser)

ser.plot(kind='pie', shadow=True, autopct='%1.2f%%')
plt.show()

As you can see in my case percentage values are not visible


Solution

According to the docs pctdistance controls the "ratio between the center of each pie slice and the start of the text generated by autopct", and it has a default value of 0.6, which causes the percentage values to be inside the pie.

Try a pctdistance value > 1:

>>> ser.plot(kind='pie', shadow=True, autopct='%1.2f%%', pctdistance=1.15)

(The above results in an "ugly" plot in which the percentages and the labels overlap. You can fix that by "moving" the labels using labeldistance.)



Answered By - Nikolaos Chatzis
Answer Checked By - Pedro (PHPFixing Volunteer)
Read More
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Older Posts Home
View mobile version

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
All Comments
Atom
All Comments

Copyright © PHPFixing