PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Friday, August 26, 2022

[FIXED] How to convert a variable spaced delimited file to a pandas dataframe

 August 26, 2022     csv, pandas, python-3.x     No comments   

Issue

ID                                     0x4607
Delivery_person_ID             INDORES13DEL02
Delivery_person_Age                 37.000000
Delivery_person_Ratings              4.900000
Restaurant_latitude                 22.745049
Restaurant_longitude                75.892471
Delivery_location_latitude          22.765049
Delivery_location_longitude         75.912471
Order_Date                         19-03-2022
Time_Orderd                             11:30
Time_Order_picked                       11:45
Weather conditions                      Sunny
Road_traffic_density                     High
Vehicle_condition                           2
Type_of_order                           Snack
Type_of_vehicle                    motorcycle
multiple_deliveries                  0.000000
Festival                                   No
City                                    Urban
Time_taken (min)                    24.000000
Name: 0, dtype: object

In an online exam, the machine learning training dataset has been split into multiple txt files. The file contains data as shown in the image. I am unable to understand how to read this data in python and convert it to a pandas dataframe. There are more than 45,000 txt files each containing data of a record of the dataset. I will have to merge those 45,000 txt files into a single .csv file. Any help will be highly appreciated.


Solution

Each of your txt files seems to contain only 1 row (as a Series).

Unfortunately, these rows are not in an easy-to-read format (for the machines) - looks like they were just printed out and saved like that.

Because of this in my solution the indices of the dataframe (which correspond to the Name - in last row of each file) won't be read: my final dataframe will be reindexed.


You'll have to iterate through all your files. Just for my example, I'm using a list of the file names:

file_names = ['file0.txt', 'file1.txt']

rows = [pd.read_csv(file_name, sep='\s\s+', header=None, index_col=0, skipfooter=1, engine='python').iloc[:, 0]
        for file_name in file_names]

df = pd.DataFrame(rows).reset_index(drop=True)


Answered By - Vladimir Fokow
Answer Checked By - Willingham (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing