PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Sunday, August 28, 2022

[FIXED] How to access multiple CSV files that share the same name from multiple folders from a zip file

 August 28, 2022     csv, dataframe, pandas, python, zip     No comments   

Issue

I have a zip file (stored locally) with multiple folders in it. In each folder are a few CSV files. I need to only access 1 particular CSV from each folder. The CSV's I am trying to access from each folder all share the same name, but I cannot figure out how to access a particular file from each folder, then concatenate them into a pandas df.

I have tried the below (initially trying to read all CSV's):

path = r"C:\Users\...\Downloads\folder.zip"
all_files = glob.glob(os.path.join(path , "/*.csv"))

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

But I get: ValueError: No objects to concatenate. The CSV's are definitely present and not empty.

I am currently trying to do this in a sagemaker notebook, not sure if that is also causing me problems. Any help would be great.


Solution

After some digging and advice from Umar.H and mad, I figured out a solution to my original question and to the code example I was originally working with.

The code I was originally working with wasn't working with accessing the zip file directly, so I unzipped the file and tried it on just a regular folder. Amending the empty list of df's li to not return an empty list was solved by changing "/*file.csv" in all_files to "*/*file.csv.

To solve the main issue I had, which was to avoid unzipping the zip file and access all required CSV's I managed to get the following to work

PATH = "C:/Users/.../Downloads/folder.zip"

li = []
with zipfile.ZipFile(PATH, "r") as f:
    for name in f.namelist():
        if name.endswith("file.csv"):
            data = f.open(name)
            df = pd.read_csv(data, header=None, low_memory = False)
            li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

Hope this can be helpful for anyone else with large zip files.



Answered By - Jasper_97
Answer Checked By - Mary Flores (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing