Issue

ID                                     0x4607
Delivery_person_ID             INDORES13DEL02
Delivery_person_Age                 37.000000
Delivery_person_Ratings              4.900000
Restaurant_latitude                 22.745049
Restaurant_longitude                75.892471
Delivery_location_latitude          22.765049
Delivery_location_longitude         75.912471
Order_Date                         19-03-2022
Time_Orderd                             11:30
Time_Order_picked                       11:45
Weather conditions                      Sunny
Road_traffic_density                     High
Vehicle_condition                           2
Type_of_order                           Snack
Type_of_vehicle                    motorcycle
multiple_deliveries                  0.000000
Festival                                   No
City                                    Urban
Time_taken (min)                    24.000000
Name: 0, dtype: object

In an online exam, the machine learning training dataset has been split into multiple txt files. The file contains data as shown in the image. I am unable to understand how to read this data in python and convert it to a pandas dataframe. There are more than 45,000 txt files each containing data of a record of the dataset. I will have to merge those 45,000 txt files into a single .csv file. Any help will be highly appreciated.

Solution

Each of your txt files seems to contain only 1 row (as a Series).

Unfortunately, these rows are not in an easy-to-read format (for the machines) - looks like they were just printed out and saved like that.

Because of this in my solution the indices of the dataframe (which correspond to the Name - in last row of each file) won't be read: my final dataframe will be reindexed.

You'll have to iterate through all your files. Just for my example, I'm using a list of the file names:

file_names = ['file0.txt', 'file1.txt']

rows = [pd.read_csv(file_name, sep='\s\s+', header=None, index_col=0, skipfooter=1, engine='python').iloc[:, 0]
        for file_name in file_names]

df = pd.DataFrame(rows).reset_index(drop=True)

Answered By - Vladimir Fokow

Answer Checked By - Willingham (PHPFixing Volunteer)

Friday, August 26, 2022

[FIXED] How to convert a variable spaced delimited file to a pandas dataframe

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Friday, August 26, 2022

Issue

Solution

0 Comments:

Post a Comment

Total Pageviews

Featured Post

Why Learn PHP Programming

Subscribe To