Issue
EDIT: I have seen all of the questions on SA for this and they all give me the error I'm asking about here- please can you leave it open so I can get some help?
I have a file I can read very simply with Bash like this:
gzip -d -c my_file.json.gz | jq .
This confirms that it is valid JSON. But when I try to read it using Python like so:
import json
import gzip
with gzip.open('my_file.json.gz') as f:
data = f.read() # returns a byte string `b'`
json.loads(data)
I get the error:
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 1632)
But I know it is valid JSON from my Bash command. I have been stuck on this seemingly simple problem for a long time now and have tried everything it feels like. Can anyone help? Thank you.
Solution
Like the documentation tells you, gzip.open()
returns a binary file handle by default. Pass in an rt
mode to read the data as text:
with gzip.open("my_file.json.gz", mode="rt") as f:
data = f.read()
... or separately .decode()
the binary data (you then obviously have to know or guess its encoding).
If your input file contains multiple JSON records on separate lines (called "JSON lines" or "JSONS"), where each is separately a valid JSON structure, jq
can handle that without any extra options, but Python's json
module needs you to specify your requirement in more detail, perhaps like this:
with gzip.open("my_file.json.gz", mode="rt") as f:
data = [json.loads(line) for line in f]
Answered By - tripleee Answer Checked By - Pedro (PHPFixing Volunteer)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.