Monday, May 9, 2022

[FIXED] How to deal with misssing values in Pandas

Issue

I´d like to know when we have a dataset with missing values, what´s the best way to treat them? Remove them directly or replace with zeros?

Suppose i have these dates:

id name price product_group
1 nd 14.35 care
2 nd 10.02 makeup
3 nd 5.40 nd
4 nd 7.68 nd

I need to analyse the dates in the column 'product group' and tried to remove the values 'nd' using this code but it doesnt work.

    order['product_group'] = order['product_group'].replace('nd', np.nan)
    order['product_group'] = order['product_group'].dropna(how='any')

Solution

You should dropna() on the whole dataframe and just subset the product_group column:

order['product_group'] = order['product_group'].replace('nd', np.nan)
order = order.dropna(subset=['product_group'])

#    id name  price product_group
# 0   1   nd  14.35          care
# 1   2   nd  10.02        makeup

As for why your version didn't work, note that when you dropna() on the column by itself (without assigning back), that works fine:

order['product_group'].dropna()

# 0      care
# 1    makeup
# Name: product_group, dtype: object

But if you assign this short Series back into the full dataframe, pandas doesn't know what to do with the extra rows and just puts the nan values back.



Answered By - tdy
Answer Checked By - Terry (PHPFixing Volunteer)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.