PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Sunday, August 28, 2022

[FIXED] How to write a table encoded as a list of dictionaries directly to a zipped archive containing a CSV?

 August 28, 2022     csv, python, sparse-matrix, zip     No comments   

Issue

Suppose you have data in the form of a list of dictionaries like d here:

d = [{'a' : 1, 'b' : 2}, {'a' : 3, 'c' : 5}]

and you want to save it as a comma-separated table to a zipped (not gzipped, I mean a .zip archive) CSV without going via, e.g., a pandas.DataFrame.from_dict().

Why not via pandas? Because d in real practice may correspond to a very large, but especially sparse, DataFrame, i.e. a table with many more columns than non-NA data per row, which for some reason occupies a huge amount of memory (BTW this is not a theory: it made our scripts crash several times, hence our need to work around it).

d is a sort of unpivoted-in-disguise version of the data, because each dictionary only contains the relevant data, not a useless sequence of NA's.

From the csv module's documentation I learned how to write d directly to a CSV:

with open('test.csv', 'w') as csvfile :
    writer = csv.DictWriter(csvfile, fieldnames = ['a','b','c'])
    writer.writeheader()
    writer.writerows(d)

but I don't see any option to write to a zipped CSV.

I consulted the documentation of zipfile, but I could not make it work, due to the usual problem between text and bytes.

if os.path.exists('test.csv.zip') :
    os.remove('test.csv.zip')
with zipfile.ZipFile('test.csv.zip', mode = 'a') as zip :
    with zip.open('test.csv', 'w') as csvfile :
        writer = csv.DictWriter(csvfile, fieldnames = ['a','b','c'])
        writer.writeheader()
        writer.writerows(d)

# TypeError: a bytes-like object is required, not 'str'

Can anyone think of a workaround, or maybe a radically different approach that I am not seeing?

The fundamental constraints are:

  1. d is always going to be generated: this we cannot decide or change
  2. avoid generating very large objects that consume as much memory or disk space as the dense pandas.DataFrame.from_dict()
  3. the data must be written to a csv.zip archive.

Otherwise we would write to a CSV, hoping that it is not too huge (but yeah, that was the initial issue, so...), and zip it afterwards.


EDIT posting the implementation from Daweo's answer, for completeness.

import os
import zipfile
import csv
import codecs
utf8 = codecs.getwriter('utf_8') # or other encoding dictated by requirements

output_zip_file = 'test.csv.zip'

if os.path.exists(output_zip_file) :
    os.remove(output_zip_file)
with zipfile.ZipFile(output_zip_file, mode = 'a') as zip :
    with zip.open('out.csv', 'w') as csvfile :
        writer = csv.DictWriter(utf8(csvfile), fieldnames = ['a','b','c'])
        writer.writeheader()
        writer.writerows(d)

Solution

You might use codecs.StreamWriter if you want to use csv.DictWriter with binary file-handle, consider following simple example

import csv
import codecs
utf8 = codecs.getwriter('utf_8') # or other encoding dictated by requirements
with open("file.csv","wb") as f:
    writer = csv.DictWriter(utf8(f), fieldnames = ['a','b','c'])
    writer.writeheader()
    writer.writerows([{'a':1},{'b':2},{'c':3}])

creates file.csv holding

a,b,c
1,,
,2,
,,3


Answered By - Daweo
Answer Checked By - Candace Johnson (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing