Issue
I have server log data that looks something like this:
2014-04-16 00:01:31-0400,583 {"Items": [
{"UsageInfo"=>"P-1008366", "Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0},
{"Role"=>"Text", "ProjectCode"=>"", "PublicationCode"=>"", "RetailPrice"=>2},
{"Role"=>"Abstract", "RetailPrice"=>2, "EffectivePrice"=>0, "ParentItemId"=>"396487"}
]}
What I'd like to a relational database that connects two tables - a UsageLog table and a UsageLogItems table, connected by a primary key id.
You can see that the UsageLog table would have feilds like:
UsageLogId
Date
Time
and the UsageLogItems table would have fields like
UsageLogId
UsageInfo
Role
RetailPrice
...
However, I am having trouble writing these into Redshift and being able to associate each record with unique and related ids as keys.
What I am currently doing is I use a ruby script that reads each line of the log file, parses out the UsageLog info (such as date and time), writes it to the database (writing single lines to Redshift is VERY slow), then creates a csv of the data from the UsageLogItems information and imports that to Redshift via S3, querying the largest id of the UsageLogs table and using that number to relate the two (this is also slow, because lots of UsageLogs do not contain any items, so I frequently load in 0 records from csv files).
This currently does work, but it is far too painfully slow to be effective at all. Is there a better way to handle this?
Solution
Amazon Redshift supports JSON ingestion using JSONPaths via COPY command.
http://docs.aws.amazon.com/redshift/latest/dg/copy-usage_notes-copy-from-json.html
Answered By - androboy Answer Checked By - Robin (PHPFixing Admin)
0 Comments:
Post a Comment
Note: Only a member of this blog may post a comment.