In our Verity days, we used a UDF called VerityClean (still available at CFLib), that did a lot of grunt work of cleaning keywords for Verity. In fact, if you read the description of the UDF, it says: “strips all invalid characters and word combinations from a search strign
to prevent verity from crashing.” Awesome, right?
Well, in moving to Solr, there was no equivalent. Solr can be very picky, it rocks when you have a UDF that:
- Replaces comma with OR
- Strips double spaces
- Strips bad characters
- Cleans up sequences of space characters
- Uppercases Solr terms like AND, OR, etc.
I just submitted SolrClean and a sister UDF uCaseWordsForSolr to CFLib. Enjoy!
UPDATE: The submissions to CFLib were never approved or simply disappeared. I’ll bring it to GitHub.
UPDATE 2: The submissions are now available on CFLib. I am preparing a separate post on them.