Web Software Architecture and Engineering – Life on the Bleeding Edge

Archive for August, 2011

Lessons Learned: Moving from Verity to Solr (Part 8)

I’m a bit behind, but it seems a couple weeks ago Ray Camden finally approved my two CFLib entries for working with Solr.

As I mentioned in my previous posts, this is critical if you are moving from Verity to Solr, or you happen to come up against many of Solr little quirks.

I’m going to take a moment, and go through the UDFs here for your benefit.

First, grab them here: http://cflib.org/udf/solrClean, and http://cflib.org/udf/uCaseWordsForSolr.

Second, you’ll note that SolrClean sounds like the venerable VerityClean UDF. Its basically meant to do something similar: take your input and sanitize it for Solr. Also, SolrClean relies on uCaseWordsForSolr.

SolrClean essentially takes your text and does the following:

  • replaces any commas with OR – so happy,sad => happy OR sad.
  • strips any double spaces
  • strips bad characters
  • cleans up sequences of space characters
  • upper cases reserved words

The last one is especially critical since Solr can treat reserved words differently based on the case used. So we change and to AND, or to OR, and that is what the uCaseWordsForSolr is all about.

As a caveat – I am seeing some issues with the code, and it may or may not have to do with the UDFs. If there are any updates, I’ll let you know. My plans is to put everything up on GitHub anyways. I am also planning to work with a vendor who will take our Solr install to a whole new level implementing, among others: synonyms, field weighting, master/slave setup with replication, upgrading to the latest Solr version, “More Like This” functionality, caching/performance tweaks, paging search!, and so much more – so stay tuned.