Showing posts with label series. Show all posts

Friday, October 28, 2022

[FIXED] how to replace empty series values with NaN in python

October 28, 2022 dataframe, is-empty, pandas, python, series No comments

Issue

I am iterating over a number of columns and storing their summary statistics like mean, median, skewness and kurtosis in a dict as below:

metrics_dict['skewness'] = data_col.skew().values[0]
metrics_dict['kurtosis'] = data_col.kurt().values[0]
metrics_dict['mean'] = np.mean(data_col)[0]
metrics_dict['median'] = np.median(data_col)

However for some columns, it gives error as below:

IndexError: index out of bounds

The column in question is below:

Index          device
61021           C:2
61022          D:3+
61023          D:3+
61024           B:1
61025          D:3+
61026           C:2

I simply want to append NA to the dict in case of such a column and not have the error interrupt my loop. Here index is just the index of the dataframe and the column under operation is device. Please note that the data has a large num of numeric columns ( ~ 500) where 2 -3 columns are like device and hence I need to just add NA to the dict for these and move on to the next column. Can someone please tell me how to do that in python?

Solution

Since these statistics are only meaningful for numeric columns, you can try isolating numeric columns. This is possible using pd.DataFrame.select_dtypes:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

numeric_cols = df.select_dtypes(include=numerics).columns

for col in df:
    if col in numeric_cols:
        # calculate & add some values to dictionary
    else:
        # add NA values to dictionary

Answered By - jpp

Answer Checked By - David Goodson (PHPFixing Volunteer)

[FIXED] Why can't you calculate a summary statistic on a series with .apply?

October 06, 2022 apply, pandas, python, series, statistics No comments

Issue

I'm experimenting with the .apply() method and noticed that Python doesn't seem to allow using it to calculate a summary statistic on a series.

You can use .apply() to modify all values in a series, e.g.,

test = pd.Series([1,2,3])

t_output = test.apply(lambda a : a *2)

t_output

yielding

However, if I wanted to generate a summary statistic, e.g.,

t_output2 = test.apply(np.sum)

t_output2

I just get the original series values:

I know I could instead write test.sum() and get the desired summary statistic. But I'm interested in better understanding the technical reason why .apply() doesn't seem to permit generating one for an isolated series.

I know that a series is size-immutable, but that shouldn't be consequential here because I'm reassigning the .apply() output to a new object. So any details on better understanding this would be appreciated!

Solution

A dataframe is two-dimensional, thus you can apply some functionality along either dimension (i.e. on either rows or columns). As a series is one-dimensional, the functionality is applied on each element (i.e. value) of the series. Of course, if the entries of the series themselves have dimensions, then summation could be a useful function to apply, but if they are just numbers, applying summation is just the identity, as there is no dimension to summarize (i.e. sum) over.

Answered By - Michael Hodel

Answer Checked By - Mildred Charles (PHPFixing Admin)

[FIXED] How to print percentages outside the pie with python

April 23, 2022 pie-chart, python-3.x, series No comments

Issue

I would like to have percentage values outside the pie. Maybe you can help

Here is my code :

import matplotlib.pyplot as plt
import pandas as pd

dict={'a':45, 'b': 123, 'c':2, 'd':1755, 'e':13}
ser = pd.Series(dict)

print(ser)

ser.plot(kind='pie', shadow=True, autopct='%1.2f%%')
plt.show()

As you can see in my case percentage values are not visible

Solution

According to the docs pctdistance controls the "ratio between the center of each pie slice and the start of the text generated by autopct", and it has a default value of 0.6, which causes the percentage values to be inside the pie.

Try a pctdistance value > 1:

>>> ser.plot(kind='pie', shadow=True, autopct='%1.2f%%', pctdistance=1.15)

(The above results in an "ugly" plot in which the percentages and the labels overlap. You can fix that by "moving" the labels using labeldistance.)

Answered By - Nikolaos Chatzis

Answer Checked By - Pedro (PHPFixing Volunteer)

Friday, October 28, 2022

[FIXED] how to replace empty series values with NaN in python

Issue

Solution

Thursday, October 6, 2022

[FIXED] Why can't you calculate a summary statistic on a series with .apply?

Issue

Solution

Saturday, April 23, 2022

[FIXED] How to print percentages outside the pie with python

Issue

Solution

Total Pageviews

Featured Post

Why Learn PHP Programming

Friday, October 28, 2022

Issue

Solution

Thursday, October 6, 2022

Issue

Solution

Saturday, April 23, 2022

Issue

Solution

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To