Issue

I am iterating over a number of columns and storing their summary statistics like mean, median, skewness and kurtosis in a dict as below:

metrics_dict['skewness'] = data_col.skew().values[0]
metrics_dict['kurtosis'] = data_col.kurt().values[0]
metrics_dict['mean'] = np.mean(data_col)[0]
metrics_dict['median'] = np.median(data_col)

However for some columns, it gives error as below:

IndexError: index out of bounds

The column in question is below:

Index          device
61021           C:2
61022          D:3+
61023          D:3+
61024           B:1
61025          D:3+
61026           C:2

I simply want to append NA to the dict in case of such a column and not have the error interrupt my loop. Here index is just the index of the dataframe and the column under operation is device. Please note that the data has a large num of numeric columns ( ~ 500) where 2 -3 columns are like device and hence I need to just add NA to the dict for these and move on to the next column. Can someone please tell me how to do that in python?

Solution

Since these statistics are only meaningful for numeric columns, you can try isolating numeric columns. This is possible using pd.DataFrame.select_dtypes:

numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']

numeric_cols = df.select_dtypes(include=numerics).columns

for col in df:
    if col in numeric_cols:
        # calculate & add some values to dictionary
    else:
        # add NA values to dictionary

Answered By - jpp

Answer Checked By - David Goodson (PHPFixing Volunteer)

Friday, October 28, 2022

[FIXED] how to replace empty series values with NaN in python

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Friday, October 28, 2022

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To