opensoul.org

Hack for partial matches in Ferret

I love ferret (the ruby port of Lucene, not the fuzzy little creatures, you sicko). But something I fight on every project is that ferret turns into a bear when you try to get it to do partial matches, like "ferr" matching "ferret" and "ferrari".

Ferret allows you to append an asterisk to your search query ("ferr*"), which works great, but we can’t expect our users to do that because damn Google 1 has set the expectation that search just works; I don’t need to use any funky syntax to find my pogs, Harry Potter gossip or BRATZ 2.

So, we can do this manually in code by appending an asterisk to anything users enter and problem solved, right? Not quite.

  • It breaks if you’re using the StemFilter, which allows you to match variations of words ("happy" would match "happiness" and "happiest")
  • It will only match partials on the last word that the user entered ("Ed Brad" won’t find "Edward Bradley")
  • Apparently the asterisk tells ferret that there has to be more characters, because full matches no longer work ("ferret*" won’t match "ferret")

So, here’s my hack.

Book.find_by_contents "(#{term})^2 OR (#{term.split.map {|t| t + "*" }.join(' ')})"

This ugly little thing will match exactly what the user entered (making use of stemming and all the magic that comes from it) and give it a little boost in the ranking, or match any part of any of the words entered, giving me partial matches.

I acknowledge that this is an ugly hack at the moment, and will break miserably if the user is any kind of a wizard that knows how to do advanced searches, but it works for now. I have no idea what kind of consequences this will have as far as search performance and such. The goal is to wrap this into a filter.

Any one else have any cleaver ideas for doing partial matches?

  1. Yes, Google, we love and hate you for raising the bar.
  2. We’re talking normal users here, which excludes anyone that is reading this.

ferret, ruby, and search December 12, 2007

4 Comments

  1. Rex Rex December 12, 2007

    I’ve done similar things with partial search. Also the fact that you can’t construct this by using query classes but have to use query string is ugly indeed.

    I’ve only done partial search on an untokenized field (eg filename), and the performance seems ok.

  2. Jack Jack December 12, 2007

    Ugly hack or not, it will come in handy. Nice!

  3. Rob O. Rob O. March 17, 2008

    Sweet! It works great for me.

    Luckily I found this, ferret is baffling for not having an option to do partial searches.

  4. Jeff Jeff November 25, 2008

    Very nice, understandable as well. Thanks, I was worried about switching from SQL searches to ferret because I couldn’t find the damn option in ferret to turn on partial word matches.

Post a Comment

Comments use textile. Anonymous comments will be deleted.

My name is Brandon Keepers. I like to build things, usually in Ruby or JavaScript. I work at GitHub and live in Holland, MI.

Popular Posts