PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Sunday, August 28, 2022

[FIXED] How can I drop NaN values as well as nearby non-Nan values from a df?

 August 28, 2022     csv, dataframe, pandas, python     No comments   

Issue

I have large CSVs (~100k rows x 30 cols). Occasionally the data has sections of nan values which span sections of the df of various sizes. I need to drop the nans but also ~3 data points either side because the non-nan data either side is borked.

One could drop any row containing a nan but this would throw away more data than needs to be.

How can I do this with python? The data has been loaded into a df.


Solution

Use:

df = pd.DataFrame({'col':['a','b','c', np.nan, 'd','e',np.nan, 's','r'],
                   'col1':4})

print (df)
   col  col1
0    a     4
1    b     4
2    c     4
3  NaN     4
4    d     4
5    e     4
6  NaN     4
7    s     4
8    r     4

#test at least one missing value
m = df.isna().any(axis=1)

#test row above and bellow match value by mask, chain by | for bitwise OR
#filter in inverted mask by ~ in boolean indexing
df = df[~(m | m.shift(fill_value=False) | m.shift(-1, fill_value=False))]
print (df)
  col  col1
0   a     4
1   b     4
8   r     4

Alternative solution:

m = df.notna().all(axis=1)

df = df[(m & m.shift(fill_value=True) & m.shift(-1, fill_value=True))]


Answered By - jezrael
Answer Checked By - Katrina (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing