opensoul.org

Merging Active Record models

August 21, 2008 code 2 min read

We’ve been working on a project that involves importing a massive amount of data from multiple sources. The data is somewhat complicated, so we occasionally end up with duplicate records that need merged together. The data is highly normalized, so there are a bunch of associations that also need merged. If you’ve ever done this by hand, you know how painful it can be.

To alleviate that pain, I introduce you to merger, a Rails plugin for merging Active Record models.

@person.merge!(Person.find_all_by_email(@person.email))

The plugin is pretty simple right now. All it does is:

  1. Given a set of records, picks the oldest record (the one with the lowest id) as the one to keep
  2. Moves any associated has_many and habtm records from the duplicates to the record that is being kept
  3. Deletes the duplicate records

We intend to add a lot to it, including:

  • Strategies for choosing which record to keep
  • Strategies for merging the individual attributes of the records
  • Recursively merge associations based on certain attributes
  • Options for what to do with the duplicate records

Check it out on Github and let us know what you think.

  1. Photo adapted from http://flickr.com/photos/xrrr/2478140383/
This content is open source. Suggest Improvements.

@bkeepers

avatar of Brandon Keepers I am Brandon Keepers, and I work at GitHub on making Open Source more approachable, effective, and ubiquitous. I tend to think like an engineer, work like an artist, dream like an astronaut, love like a human, and sleep like a baby.