Illustration of Yeti character writing with a pencil.

When to consider Apache Solr


One of the main criticisms people throw at Drupal is that the provided search capability is weak. To be fair, it isn’t really Drupal’s fault… MySQL just isn’t built for complex text comparisons, and since Drupal ships with MySQL integration, there isn’t much Drupal (core) can do to overcome this limitation. Luckily, there is an alternative in the form of Apache Solr.

Apache Solr (pronounced solar just like the sun) is a Java based search platform. Without getting sidetracked into how it actually works, let’s just say that it excels in indexing and searching through content, and providing results in an easily filterable way. The Apache Solr module is a robust integration between Drupal and Solr, and is nearly a drop-in replacement for the core search. The only real drawback to using Solr is that it requires an extra, memory intensive Java process to be running on a server somewhere.

With that in mind, I’d like to present some reasons you shouldn’t use Solr, followed by some reasons you should.

Why you don’t need Solr

It’s Resource Heavy: While Solr can allow you to scale more easily, it takes resources above and beyond the basic LAMP stack. If you have a low budget server and can’t or don’t want to get another, Solr probably isn’t feasible.

It might be unnecessary: If your content consists primarily of just title/body fields, you probably don’t need Solr. Drupal search is built for this kind of simple text matching, and faceting probably will not be of much use to you.

You’ve got a tight budget: Resource concerns aside, Solr is a complex piece of technology that takes time and experience to configure both on the server side and on the Drupal side. The amount of configuration depends on what you want to do with it, but it will always be more complex than using Drupal’s core search.

Why you need Solr

Performance

Java vs. PHP: The main performance benefit of using Solr comes from the overhead incurred by using PHP/MySQL for text comparison. Because Java is pound for pound much faster than PHP, and Solr is written specifically for indexing and result comparison, full-text searching (which is what happens when you look for keywords in a database of content) is much faster. When it comes to faceting (drilling down through a search based on categorization), there simply isn’t any contest. MySQL just isn’t built to do this.

Scalability: One of the main benefits of using Solr is that it uses a REST interface. That means that it can be set up on a different server from your Apache/MySQL installations, and queried against remotely. This allows your search capabilities to scale independently from your web/database servers. Another potential win here can be found in taking advantage of hosted Solr solutions, like Acquia Search, which means you do not have to set up or manage your Solr instance.

Advanced Search Features

Faceting: This is probably the number one reason people turn to Apache Solr over Drupal core search. Faceting is the ability to categorize content using a variety of different properties, then filter results based on multiple categories at a time. It’s virtually ubiquitous in online retail these days, and you’ve probably seen it in the form of filters to restrict your results to a given manufacturer, price range, etc. This is particularly useful for content that has a lot of metadata associated with it (lots of taxonomy terms or fields for example). Out of the box, the Drupal Apache Solr module can give you the ability to facet by both fields and taxonomy. Additionally, with each search a user makes, Solr has the ability to return the applicable categories and result counts for each category (as shown in the lead image), a very user-friendly feature.

Complex Weighting: Another great feature of Solr is the ability to score results based on a number of parameters, both at the time of indexing and at the time of the search. For example, imagine you have a site that provides some content for free, and some content for pay, and you want to prioritize the paid content so that it gets an extra “boost” and tends to appear at the top of the search results. Or, imagine you have two taxonomy vocabularies, one for category, and one for tags. Since the category terms will probably be more indicative of what the main subject of each piece of content is, you might adjust your settings so that matches to the category terms carry more weight than matches to the tags vocabulary.

Smart Searching: Solr has the ability to handle misspellings and homophones, proximity search using lat/long coordinates, and much much more. If you have a specific type of search that needs to be performed, there’s a good chance that Solr can be configured to do it.

comments powered by Disqus