af83

Document-oriented database and migration

Document-oriented databases like MongoDb or Redis are databases we rely on at af83. Being schemaless is one of the features of those databases. Not having to explicitly define a schema of your database can be a real asset for some projects where flexibility is a prerequisite.

For most of your projects, at one point or another, the word "migration" will come up.

Schema Migration

A schema migration will create, destroy or update the schema of a database. Unlike sql powered projects, with a document-oriented database, a schema migration is not necessary.

Yet…

Even if a schema migration is not needed, a data migration can be.

Data Migration

Ok, so schema migrations are not needed, but when you update a collection, you are eventually going to need to update your data.

The big question is: how and when should you make those data migrations?

A common solution is to use a mechanism built for schema migration. This code is run on a deploy. In some cases, this can lead to some unexpected behaviors.

For example users are stored in one collection where you have already stored some classic data like email, username, addresses and so on. Here's a new feature : adding geolocalisation of user's addresses. This feature itself is quite easy to implement, you just have to add some new fields and calls to some external api.

However, updating the userbase is mandatory (or you might find all of your users are somewhere off the coast of Africa at coordinates 00.00,00.00); and this userbase can be quite large. Updating all those users during deploy will demand quite some time, and you probably want to update only active users.

Lazy Migration

Lazy migration's main idea is to migrate only when it's needed. Data are not updated on deploy, which implies iterating over each object present in a collection, but when a object is fetched from database, it will check its state, and run a migration if required. The access can very well be a user logging-in or your indexing job visiting the object (it got dirty, there is a good chance it neede reindexing..)

Paresseux is a small ruby gem to help you with this issue.

class User
  # Define current version of mapping at the very top.
  VERSION = 3

  include Mongoid::Document
  include Mongoid::Paresseux
end

class UserParesseux < Paresseux::Migration
  def migration_1_to_2
    # updating data with you own code
  end

  def migration_2_to_3
    # updating data with your own code
  end

  # and so on...
end

When a user, which was created when VERSION was 2, is fetched from the database, it will run the migration migration_2_to_3 from UserParesseux. In our example, it fetches geolocalisation information and stores it.

One last thing about paresseux : it's at its experimental stage right now. Only Mongoid is supported at the moment, so bug reports and pull requests are more than welcome.

With this solution, code responsible for data migration is identified: it can belong in a folder app/migrations in a rails project (for example), it's not hidden in some rake task forgotten by everybody. It belongs in your application, meaning that it can (even should) be tested.

How do you deal with those issues on your projects? Do you use some lazy migrations?

blog comments powered by Disqus