Sunday, August 28, 2022

[FIXED] How to read csv with redundant characters as dataframe?

Issue

I have hundreds of CSV files separated by comma, and the decimal separator is also a comma. These files look like this:

ID,columnA,columnB
A,0,"15,6"
B,"1,2",0
C,0,

I am trying to read all these files in python using pandas, but I am not able to separate these values properly in three columns, maybe because of the decimal separator or because some values have quotation marks.

I first tried with the code below, but then even with different encodings I could not achieve my goal

df = pd.read_csv("test.csv", sep=",")

Anyone could help me? The result should be a dataframe like this:

  ID  columnA  columnB
0  A      0.0     15.6
1  B      1.2      0.0
2  C      0.0      NaN

Solution

You just need to specify decimal=","

from io import StringIO

file = '''ID,columnA,columnB
A,0,"15,6"
B,"1,2",0
C,0,'''

df = pd.read_csv(StringIO(file), decimal=",")
print(df)

Output:

  ID  columnA  columnB
0  A      0.0     15.6
1  B      1.2      0.0
2  C      0.0      NaN


Answered By - BeRT2me
Answer Checked By - Marilyn (PHPFixing Volunteer)

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.