PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Thursday, October 6, 2022

[FIXED] How to resample a dataframe removing nan values?

 October 06, 2022     data-science, nan, python, resampling, statistics     No comments   

Issue

I have a large dataframe like this, not used for time time but for a binary classification task. It contains two important feature columns which have more than 60% NaN values. Instead of removing those columns or shrinking the dataframe are there other ways to resample the data and removing those NaNs or substituting them with synthetic values? I was thinking about the SMOTE package but I know it's used for unbalanced dataframes, not for NaNs. Could I use interpolation through NN or I'll risk to generate misleading data?


Solution

No clear answer on this: depends a lot on your data. If the two columns are really "important" as you say, how can they be so empty? What leads to considering them important? You can easily fake-fill them with fillna or any aggregating function (avg?), but depends on the domain. You can resort to SMOTE, but be sure to have enough data to generate sensible outputs.



Answered By - rikyeah
Answer Checked By - Timothy Miller (PHPFixing Admin)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home
View mobile version

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing