PHPFixing
  • Privacy Policy
  • TOS
  • Ask Question
  • Contact Us
  • Home
  • PHP
  • Programming
  • SQL Injection
  • Web3.0

Wednesday, April 13, 2022

[FIXED] How do you consistently migrate a large MongoDB collection?

 April 13, 2022     migration, mongodb, mongoose     No comments   

Issue

I was trying to migrate a large MongoDB of ~600k documents, like so:

    for await (const doc of db.collection('collection').find({
        legacyProp: { $exists: true },
    })) {
        // additional data fetching from separate collections here
        const newPropValue = await fetchNewPropValue(doc._id)
        await db.collection('collection').findOneAndUpdate({ _id: doc._id }, [{ $set: { newProp: newPropValue } }, { $unset: ['legacyProp'] }])
    }
}

When the migration script finished, data was still being updated for about 30 minutes or so. I've concluded this by computing document count of documents containing legacyProp property:

db.collection.countDocuments({ legacyProp: { $exists: true } })

which was decreasing on subsequent calls. After a while, the updates stopped and the final document count of documents containing legacy prop was around 300k, so the update failed silently resulting in a data loss. I'm curious what exactly happened, and most importantly, how do you update large MongoDB collections without any data loss? Keep in mind, there is additional data fetching involved before every update operation.


Solution

My first attempt would be to build function of fetchNewPropValue() in an aggregation pipeline.

Have a look at Aggregation Pipeline Operators

If this is not possible then you can try to put all newPropValue's into array and use it like this. 600k properties should fit easily into your RAM.

const newPropValues = await fetchNewPropValue() // getting all new properties as array [{_id: ..., val: ...}, {_id: ..., val: ...}, ...]
db.getCollection('collection').updateMany(
   { legacyProp: { $exists: true } },
   [
      {
         $set: {
            newProp: {
               $first: {
                  $filter: { input: newPropValues, cond: { $eq: ["$_id", "$$this._id"] } }
               }
            }
         }
      },
      { $set: { legacyProp: "$$REMOVE", newProp: "$$newProp.val" } }
   ]
)

Or you can try bulkWrite:

let bulkOperations = []
db.getCollection('collection').find({ legacyProp: { $exists: true } }).forEach(doc => {
   const newPropValue = await fetchNewPropValue(doc._id);
   bulkOperations.push({
      updateOne: {
         filter: { _id: doc._id },
         update: {
            $set: { newProp: newPropValue },
            $unset: { legacyProp: "" }
         }
      }
   });
   if (bulkOperations.length > 10000) {
      db.getCollection('collection').bulkWrite(bulkOperations, { ordered: false });
      bulkOperations = [];
   }
})
if (bulkOperations.length > 0)
   db.getCollection('collection').bulkWrite(bulkOperations, { ordered: false })


Answered By - Wernfried Domscheit
Answer Checked By - Terry (PHPFixing Volunteer)
  • Share This:  
  •  Facebook
  •  Twitter
  •  Stumble
  •  Digg
Newer Post Older Post Home

0 Comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Total Pageviews

Featured Post

Why Learn PHP Programming

Why Learn PHP Programming A widely-used open source scripting language PHP is one of the most popular programming languages in the world. It...

Subscribe To

Posts
Atom
Posts
Comments
Atom
Comments

Copyright © PHPFixing