PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Monday, July 25, 2022

[FIXED] How to read a json.gz file using Python?

 July 25, 2022     gzip, json, python     No comments   

Issue

EDIT: I have seen all of the questions on SA for this and they all give me the error I'm asking about here- please can you leave it open so I can get some help?

I have a file I can read very simply with Bash like this: gzip -d -c my_file.json.gz | jq . This confirms that it is valid JSON. But when I try to read it using Python like so:

import json
import gzip
with gzip.open('my_file.json.gz') as f:
    data = f.read() # returns a byte string `b'`
json.loads(data)

I get the error:

json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 1632)

But I know it is valid JSON from my Bash command. I have been stuck on this seemingly simple problem for a long time now and have tried everything it feels like. Can anyone help? Thank you.


Solution

Like the documentation tells you, gzip.open() returns a binary file handle by default. Pass in an rt mode to read the data as text:

with gzip.open("my_file.json.gz", mode="rt") as f:
    data = f.read()

... or separately .decode() the binary data (you then obviously have to know or guess its encoding).

If your input file contains multiple JSON records on separate lines (called "JSON lines" or "JSONS"), where each is separately a valid JSON structure, jq can handle that without any extra options, but Python's json module needs you to specify your requirement in more detail, perhaps like this:

with gzip.open("my_file.json.gz", mode="rt") as f:
    data = [json.loads(line) for line in f]


Answered By - tripleee
Answer Checked By - Pedro (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing