I’m a bit behind, but it seems a couple weeks ago Ray Camden finally approved my two CFLib entries for working with Solr.
As I mentioned in my previous posts, this is critical if you are moving from Verity to Solr, or you happen to come up against many of Solr little quirks.
I’m going to take a moment, and go through the UDFs here for your benefit.
First, grab them here: http://cflib.org/udf/solrClean, and http://cflib.org/udf/uCaseWordsForSolr.
Second, you’ll note that SolrClean sounds like the venerable VerityClean UDF. Its basically meant to do something similar: take your input and sanitize it for Solr. Also, SolrClean relies on uCaseWordsForSolr.
SolrClean essentially takes your text and does the following:
- replaces any commas with OR – so happy,sad => happy OR sad.
- strips any double spaces
- strips bad characters
- cleans up sequences of space characters
- upper cases reserved words
The last one is especially critical since Solr can treat reserved words differently based on the case used. So we change and to AND, or to OR, and that is what the uCaseWordsForSolr is all about.
As a caveat – I am seeing some issues with the code, and it may or may not have to do with the UDFs. If there are any updates, I’ll let you know. My plans is to put everything up on GitHub anyways. I am also planning to work with a vendor who will take our Solr install to a whole new level implementing, among others: synonyms, field weighting, master/slave setup with replication, upgrading to the latest Solr version, “More Like This” functionality, caching/performance tweaks, paging search!, and so much more – so stay tuned.
Comments on: "Lessons Learned: Moving from Verity to Solr (Part 8)" (4)
Thanks for your marvelous posting! I definitely enjoyed
reading it, you might be a great author.I will be sure to bookmark your blog
and may come back someday. I want to encourage you to definitely continue your great writing, have a nice weekend!
Thanks, I appreciate spam like this.
Did you get any further into “More Like This” functionality? I’d be very interested in reading about that
We have a project to start using Solr 4.1 (latest) and experiment with some of the even more advanced functionality. Coming soon!