Is this your first visit? You may want to subscribe to the feed.

Keepin' Sphinx Indexes Fresh

<infomercial-voice>Stale indexes got you down? Embarrassed that your users’ searches are coming up empty? Act now and you can avoid stale indexes with NEW and IMPROVED delayed delta indexing!!</infomercial-voice>

Ok, maybe it’s not new and improved. It’s actually been around since January, but it’s still awesome. Thinking Sphinx can use delayed_job to keep indexes fresh.

I was slow at jumping on the Sphinx bandwagon for one reason: the index has to be rebuilt to incorporate new data. Delta indexing alleviated some of this by storing frequent changes in a small separate index, but it still had to be occasionally reindexed. It also seemed to only index existing records in my trials with it. New records didn’t ever seem to show up until I rebuilt the whole index.

From what I can tell, delayed delta indexing makes everything Just Work™, and here’s how to use it…

After you’ve setup ThinkingSphinx, set the :delayed property to :delta in your index:

class Listing < ActiveRecord::Base
  define_index do
    indexes title
    indexes description
    indexes user.login, :as => :user

    set_property :delta => :delayed
  end
end

The delayed delta support depends on delayed_job, but if you’re using the gem version, it’s already bundled in. I’m using delayed job for some other things in my project, so I still have it installed separately and that seems to be working just fine.

delayed_job uses a table to keep track of the jobs that need run, so create a migration containing:

create_table :delayed_jobs, :force => true do |table| 
   table.integer  :priority, :default => 0 
   table.integer  :attempts, :default => 0 
   table.text     :handler 
   table.string   :last_error 
   table.datetime :run_at 
   table.datetime :locked_at 
   table.datetime :failed_at 
   table.string   :locked_by 
   table.timestamps 
end

And lastly, all you need to do is fire up the worker process:

$ rake thinking_sphinx:delayed_delta

Now whenever changes are made to your models, the index will be updated moments later. And that’s how you keep it fresh!

Code: rails, search, sphinx Apr 29, 2009 ● updated Apr 29, 2009 2 comments

2 comments

  1. Wow thanks a lot! I’ve been looking for info on how to use delayed_job for a while.

    Monica Monica September 11, 2009 at 08:08 PM
  2. You can also use Gearman workers to do this. I created this with my Narada project. I have a worker for indexing, a worker that counts how many documents hae been stored in the database, then at some given number, the index is rebuilt—based off of what is being stored, event-driven.

    CaptTofu CaptTofu September 16, 2009 at 05:00 PM

Speak your mind:

*

*


* I hate spam and will never sell or publish your email address.

(You may use textile in your comments.)

Subscribe

Browse by Tag