Issue
I´d like to know when we have a dataset with missing values, what´s the best way to treat them? Remove them directly or replace with zeros?
Suppose i have these dates:
id | name | price | product_group |
---|---|---|---|
1 | nd | 14.35 | care |
2 | nd | 10.02 | makeup |
3 | nd | 5.40 | nd |
4 | nd | 7.68 | nd |
I need to analyse the dates in the column 'product group' and tried to remove the values 'nd' using this code but it doesnt work.
order['product_group'] = order['product_group'].replace('nd', np.nan)
order['product_group'] = order['product_group'].dropna(how='any')
Solution
You should dropna()
on the whole dataframe and just subset
the product_group
column:
order['product_group'] = order['product_group'].replace('nd', np.nan)
order = order.dropna(subset=['product_group'])
# id name price product_group
# 0 1 nd 14.35 care
# 1 2 nd 10.02 makeup
As for why your version didn't work, note that when you dropna()
on the column by itself (without assigning back), that works fine:
order['product_group'].dropna()
# 0 care
# 1 makeup
# Name: product_group, dtype: object
But if you assign this short Series back into the full dataframe, pandas doesn't know what to do with the extra rows and just puts the nan
values back.
Answered By - tdy Answer Checked By - Terry (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.