opensoul.org

Keepin' Sphinx Indexes Fresh

April 30, 2009 code 3 min read

<infomercial-voice>Stale indexes got you down? Embarrassed that your users’ searches are coming up empty? Act now and you can avoid stale indexes with NEW and IMPROVED delayed delta indexing</infomercial-voice>

Ok, maybe it’s not new and improved. It’s actually been around since January, but it’s still awesome. Thinking Sphinx can use delayed_job to keep indexes fresh.

I was slow at jumping on the Sphinx bandwagon for one reason: the index has to be rebuilt to incorporate new data. Delta indexing alleviated some of this by storing frequent changes in a small separate index, but it still had to be occasionally reindexed. It also seemed to only index existing records in my trials with it. New records didn’t ever seem to show up until I rebuilt the whole index.

From what I can tell, delayed delta indexing makes everything Just Work™, and here’s how to use it…

After you’ve setup ThinkingSphinx, set the :delayed property to :delta in your index:

class Listing < ActiveRecord::Base
  define_index do
    indexes title
    indexes description
    indexes user.login, :as => :user

    set_property :delta => :delayed
  end
end

The delayed delta support depends on delayed_job, but if you’re using the gem version, it’s already bundled in. I’m using delayed job for some other things in my project, so I still have it installed separately and that seems to be working just fine.

delayed_job uses a table to keep track of the jobs that need run, so create a migration containing:

create_table :delayed_jobs, :force => true do |table|
   table.integer  :priority, :default => 0
   table.integer  :attempts, :default => 0
   table.text     :handler
   table.string   :last_error
   table.datetime :run_at
   table.datetime :locked_at
   table.datetime :failed_at
   table.string   :locked_by
   table.timestamps
end

And lastly, all you need to do is fire up the worker process:

$ rake thinking_sphinx:delayed_delta

Now whenever changes are made to your models, the index will be updated moments later. And that’s how you keep it fresh!

This content is open source. Suggest Improvements.

@bkeepers

avatar of Brandon Keepers I am Brandon Keepers, and I work at GitHub on making Open Source more approachable, effective, and ubiquitous. I tend to think like an engineer, work like an artist, dream like an astronaut, love like a human, and sleep like a baby.