Web Software Architecture and Engineering – Life on the Bleeding Edge

We’ve been using Verity for years. Needless to say, there is a reason I don’t have hair on my head. 🙂
I can’t wait to move to Solr. But in the mean time, we made several enhancement to our indexing that I wanted to share.
First, we have close to 30,000 records (many with large text fields) that get indexed almost hourly. These records are constantly being updated.
Every several days, we would have corruption or locking issues. We decided that we would do a full purge and re-index nightly to fix this issue.
In doing 30,000 records, Verity would freak out. I mean, before we could simply load all into memory via CFQuery, and pass it to Verity. That soon became unwieldy, not because of memory issues (another topic), but because Verity would throw up.
So the first thing we did was look up an industrial strength SQL script to page through the records. Here is the process we use:

  1. Purge Index
  2. Use Special SQL Script to grab 500 records
  3. Update Index
  4. Grab next 500
  5. Update Index
  6. Repeat steps 4 and 5 till we get to our record count
  7. Optimize

This works like a charm. Well, until last night when we started getting errors from the database. We do lots of SQL Server replication (another topic!), and it seems the tables we were indexing were throwing dead lock errors. Basically SQL Server 2005 found two processes asking for a lock on similar data and killed our index process. First off, we use the (NOLOCK) flag next to each table in our query. That should have prevented the issue. I found so many search results of people complaining why SQL Server is just plain not smart enough to handle simple locks issues like this. Anyways, I digress.
To resolve the issue, I added two additional lines to the SQL query pulling the data:
Hope this helps folks having similar issues.


Comments on: "Verity Indexing for SQL Server – Some Tips" (5)

  1. Just use SQL Server Full Text indexing. Easier set up, Fast response and a LOT easier maintenance.

  2. WebChain,

    You’re kidding, right?

  3. Hi,

    I use Verity to regularly index large sets of documents each night.

    I had a few issues, but by setting up an always existing backup/copy, I’ve eliminated all of my issues.

    Here’s how I run it each night:

    1. I place the update process in a server based exclusive lock (through a server variable).
    2. I reroute all active queries to the verity instance copy.
    3. I index the files in the primary instance.
    4. I optimize the primary instance.
    5. I point all queries back to the primary instance.
    6. I index the files in the copy instance.
    7. I optimize the files in the copy instance.
    8. I open up the exclusive lock.

    The only time I’ve ever had issues with Verity is when queries are being run against an updating verity instance. By eliminating that, I’ve had no problems.

    I hope this helps,

  4. Lyle, that’s fascinating!

  5. Lyle, yes problems with Verity usually happen because of active queries while the indexes are being updated. I really like your approach, but having multiple Verity instances is only an option in CF Enterprise, isn’t it?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: