Issue

I´d like to know when we have a dataset with missing values, what´s the best way to treat them? Remove them directly or replace with zeros?

Suppose i have these dates:

id	name	price	product_group
1	nd	14.35	care
2	nd	10.02	makeup
3	nd	5.40	nd
4	nd	7.68	nd

I need to analyse the dates in the column 'product group' and tried to remove the values 'nd' using this code but it doesnt work.

    order['product_group'] = order['product_group'].replace('nd', np.nan)
    order['product_group'] = order['product_group'].dropna(how='any')

Solution

You should dropna() on the whole dataframe and just subset the product_group column:

order['product_group'] = order['product_group'].replace('nd', np.nan)
order = order.dropna(subset=['product_group'])

#    id name  price product_group
# 0   1   nd  14.35          care
# 1   2   nd  10.02        makeup

As for why your version didn't work, note that when you dropna() on the column by itself (without assigning back), that works fine:

order['product_group'].dropna()

# 0      care
# 1    makeup
# Name: product_group, dtype: object

But if you assign this short Series back into the full dataframe, pandas doesn't know what to do with the extra rows and just puts the nan values back.

Answered By - tdy

Answer Checked By - Terry (PHPFixing Volunteer)

Monday, May 9, 2022

[FIXED] How to deal with misssing values in Pandas

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Monday, May 9, 2022

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To