Issue
ID 0x4607
Delivery_person_ID INDORES13DEL02
Delivery_person_Age 37.000000
Delivery_person_Ratings 4.900000
Restaurant_latitude 22.745049
Restaurant_longitude 75.892471
Delivery_location_latitude 22.765049
Delivery_location_longitude 75.912471
Order_Date 19-03-2022
Time_Orderd 11:30
Time_Order_picked 11:45
Weather conditions Sunny
Road_traffic_density High
Vehicle_condition 2
Type_of_order Snack
Type_of_vehicle motorcycle
multiple_deliveries 0.000000
Festival No
City Urban
Time_taken (min) 24.000000
Name: 0, dtype: object
In an online exam, the machine learning training dataset has been split into multiple txt files. The file contains data as shown in the image. I am unable to understand how to read this data in python and convert it to a pandas dataframe. There are more than 45,000 txt files each containing data of a record of the dataset. I will have to merge those 45,000 txt files into a single .csv file. Any help will be highly appreciated.
Solution
Each of your txt files seems to contain only 1 row (as a Series
).
Unfortunately, these rows are not in an easy-to-read format (for the machines) - looks like they were just printed out and saved like that.
Because of this in my solution the indices of the dataframe (which correspond to the Name
- in last row of each file) won't be read: my final dataframe will be reindexed.
You'll have to iterate through all your files. Just for my example, I'm using a list of the file names:
file_names = ['file0.txt', 'file1.txt']
rows = [pd.read_csv(file_name, sep='\s\s+', header=None, index_col=0, skipfooter=1, engine='python').iloc[:, 0]
for file_name in file_names]
df = pd.DataFrame(rows).reset_index(drop=True)
Answered By - Vladimir Fokow Answer Checked By - Willingham (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.