A few weeks ago I attended CouchConf in Berlin and during the sessions (and in between) one topic was raised several times: How to migrate data between “schemas” or document versions. I described how we are migrating documents and I want to take a moment to explain the process in more detail. It might sound trivial but there was interest in the description during the conference, so I’m hoping it may prove helpful for others nonetheless.
Since CouchDb is inherently unstructured, there’s no global schema that you manage to control your data’s structure. That’s often a good thing, because it gives you flexibility, but it can also cause problems, for example when you want to access documents without handling all sorts of different “versions” of your document you might have.
For example, say you have started out with an initial player document (we’re sticking with the RPG theme set in the Couchbase examples ;):
{
'version' : 1,
'name:' : 'Player A',
'xp': 1234
}
Now say during testing you find that you need to know a player’s level. You’ve decided that the it should always be xp/100 + 1 but you don’t want to recompute this all the time in code but rather store it in the document directly. For various other reasons you’ve also decided against creating a view and therefore you want to migrate all your documents to this format:
{
'version' : 2,
'name:' : 'Player A',
'xp' : 1234,
'level' : 13
}
Note that the initial document already included a version attribute that we’re using to keep track of our migrations but even if this weren’t the case from the start, it’s easy to simply treat documents without a version attribute as “version 0” so to speak and handle them similarly to the rest of this example.
So how do we migrate from version 1 to version 2 then?
The idea is to create a view that shows all old revision documents and process them until the view has no more items. The view would be defined with the following (trivial) map function:
function(doc) {
if (doc.version && doc.version == 1) {
emit(doc._id);
}
}
Now it’s simply a matter of processing all items in this view, for example with the following python-couchdb method that takes a database object as a parameter:
def migrate_v1_v2(db):
v1 = db.view('_design/migration/_view/v1')
for row in v1.rows:
doc = db[row.key]
if doc['version'] == 1:
doc['version'] = 2
# we want to add the level stat, which is simply xp/100, starting from 1
doc['level'] = doc['xp']/100 + 1
db[doc.id] = doc
where v1
is the name of the view we defined above.
The complete example in the form of a unit test is available on github. The only dependency is python-couchdb. It should be trivial to translate this pattern to other client libraries. It might also be useful to extend this concept to a migration framework á la Ruby on Rails.