Web Software Architecture and Engineering – Life on the Bleeding Edge

Previously, in Part 4 of this series, I blogged about some difficulties in working with Solr. I am following up with some more lessons learned.
In Part 3 of this series, I spoke about my frustration with wild cards. Solr, out of the box, will support wild cards like ? or * anywhere in the criteria, except as the first character. In Part 3, I detailed a known work-around.
Well, to my amazement, in pouring over Solr docs as I have been doing, I found with Solr 1.4 there is a way to do this natively! Woo-hoo!
In my schema.xml, for your fieldType (in my case fieldType name=”text”…, under analyzer type=”index”), you can add a new filter called: ReversedWildcardFilterFactory. More details @ http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ReversedWildcardFilterFactory.
So I added: <filter class=”solr.ReversedWildcardFilterFactory” withOriginal=”true” maxPosAsterisk=”3″ maxPosQuestion=”2″ maxFractionAsterisk=”0.33″/>
Basically what this does is it takes the fields of type Text when indexing, and stores a reverse of them in the index. You can find what the additional parameters mean online, but basically setting withOriginal = true says to keep the original when indexing.
So let’s give an example. Let’s say you are indexing the word “bookshelf”. Before this setting, you could search for “book*” and this record would show just fine. But you could not search for “*shelf”. Now when you index, Solr will store “bookshelf” and “flehskoob”. When you search for “*shelf”, it takes note of the leading *, reverses it to “flehs*”, which would match “flehskoob”, and bingo, you have a match!
Note: Since you are storing additional data, obviously this will cause your collection size to grow, but its worth it!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: