PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Thursday, August 11, 2022

[FIXED] How to change the decimal format from dot to comma when reading parquet files?

 August 11, 2022     decimal, pandas, parquet, python     No comments   

Issue

I'm working with parquet files and in order to read them I'm using pd.read_parquet(). However, the numerical values in the file are using commas and it is misunderstanding the numbers.

How can I change the decimal sign from dot to comma?

Here my piece of code:

new_col = pa.parquet.read_table(filepath).to_pandas()
aux = pd.concat([aux, new_col])

df.head()

                      X_Principal  Y_Principal  value_main  \
ts                                                                     
2016-01-27 15:15:00             1.0             4.0        11.020800   
2016-01-27 15:15:00             1.0             4.0        11.020800   
2016-01-27 15:15:00             1.0             4.0        36.408001   
2016-01-27 15:15:00             1.0             4.0        36.408001   
2016-01-27 15:30:00             1.0             4.0        12.004800 

type(new_col)

<class 'pandas.core.frame.DataFrame'>  

The number on the column value should be something like 110.20800, for example.


Solution

Let's do some minimal reproducible experiment.

Let's prepare some data:

In [1]: df = pd .DataFrame({"a":["1,1", "1,2"],"b":[1,2]})                                                         

In [2]: df.to_parquet("./df.parquet", compression="GZIP") 

Let's check what do we have indeed:

18:48:29 delete$ parquet-cat df.parquet 
a = 1,1
b = 1

a = 1,2
b = 2

Then, let's read the data and cast column of concern to float:

In [8]: df1 = pd.read_parquet("./df.parquet")                                                                                          

In [9]: df1                                                                                                                            
Out[9]: 
     a  b
0  1,1  1
1  1,2  2

In [10]: df1.a.str.replace(",",".").astype("float64")                                                                                  
Out[10]: 
0    1.1
1    1.2
Name: a, dtype: float64

As you can see, it's working on a parquet file with comma decimals.

PS

The data you added to your question does not quite coincide with the question itself. I think you should investigate closer what you have in parquet file, with tools like parquet-tool and if it reads correctly.



Answered By - Sergey Bushmanov
Answer Checked By - Senaida (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing