PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Tuesday, August 30, 2022

[FIXED] How can I import a very large csv into dynamodb?

 August 30, 2022     amazon-dynamodb, amazon-s3, csv     No comments   

Issue

So I have very large csv file in my s3 database (2 mil+ lines) and I want to import it to dynamodb.

What I tried:

  1. Lambda I manage to get the lambda function to work, but only around 120k lines were imported to ddb after my function being timed out.

  2. Pipeline When using pipeline it got stuck on "waiting for runner" followed by it stopping completely


Solution

Here's a serverless approach to process the large .csv in small chunks with 2 Lambdas and a SQS Queue:

  1. Using a one-off Reader Lambda, extract the primary key information for all records using S3 Select SQL to SELECT s.primary_key FROM S3Object s, querying the .csv in place. See the SelectObjectContent API for details.
  2. The Reader Lambda puts the primary keys into a SQS queue. Add a Dead Letter Queue to capture errors.
  3. Add the queue as a Writer Lambda's event source. Enable batching. Limit concurrency if desired.
  4. Parallel Writer Lambda invocations fetch records for its batch of primary keys from the .csv using S3 Select: SELECT * WHERE s.primary_key IN ('id1', 'id2', 'id3') FROM S3Object s
  5. The Writer Lambda writes its batch of records to the DynamoDB table.


Answered By - fedonev
Answer Checked By - Willingham (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

1,214,365

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © 2025 PHPFixing