I’m a bit behind, but it seems a couple weeks ago Ray Camden finally approved my two CFLib entries for working with Solr.
As I mentioned in my previous posts, this is critical if you are moving from Verity to Solr, or you happen to come up against many of Solr little quirks.
I’m going to take a moment, and go through the UDFs here for your benefit.
First, grab them here: http://cflib.org/udf/solrClean, and http://cflib.org/udf/uCaseWordsForSolr.
Second, you’ll note that SolrClean sounds like the venerable VerityClean UDF. Its basically meant to do something similar: take your input and sanitize it for Solr. Also, SolrClean relies on uCaseWordsForSolr.
SolrClean essentially takes your text and does the following:
- replaces any commas with OR – so happy,sad => happy OR sad.
- strips any double spaces
- strips bad characters
- cleans up sequences of space characters
- upper cases reserved words
The last one is especially critical since Solr can treat reserved words differently based on the case used. So we change and to AND, or to OR, and that is what the uCaseWordsForSolr is all about.
As a caveat – I am seeing some issues with the code, and it may or may not have to do with the UDFs. If there are any updates, I’ll let you know. My plans is to put everything up on GitHub anyways. I am also planning to work with a vendor who will take our Solr install to a whole new level implementing, among others: synonyms, field weighting, master/slave setup with replication, upgrading to the latest Solr version, “More Like This” functionality, caching/performance tweaks, paging search!, and so much more – so stay tuned.
I have a love/hate relationship with MS SQL Server. While its easy to use and setup, its a pain to grow and manage. The tools seems to cover only 20% of the normal use case.
Anyways, wouldn’t it be nice if MS SQL Server actually told you where you are missing indexes, and where performance can be improved.
Well, it can! Actually, it won’t tell you, unless you ask… so to ask, run this script, and it will even have a column with the CREATE INDEX script for you!
migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,
'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)
+ '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
+ ' ON ' + mid.statement
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
migs.*, mid.database_id, mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
More details found here: http://blogs.msdn.com/b/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx