Divide & Conquer: Drupal 7, Symfony, Elasticsearch, and AngularJS - Part 1

It isn't just about Drupal here at ActiveLAMP -- when the right project comes along that diverges from the usual demands of content management, we get to use other cool technologies to satisfy more exotic requirements. Last year we had a project that presented us with the opportunity to broaden our arsenal beyond the Drupal toolbox. Basically, we had to build a website which handles a growing amount of vetted content coming in from the site's community and 2 external sources, and the whole catalog is available through the use of a rich search tool and also through a RESTful web service which other of our client's partners can use to search for content to display on their respective websites.

Drupal 7 -- more than just a CMS

We love Drupal and we recognize its power in managing content of varying types and complexity. We at ActiveLAMP have solved a lot of problems with it in the past, and have seen how potent it can be. We were able to map out many of the project's requirements to Drupal functionality and we grew confident that it is the right tool for the project.

We pretty much implemented the majority of the site's content-management, user-management, and access-control functionality with Drupal, from content creation, revision, display, and for printing. We relied heavily on built-in functionality to tie things together. Did I mention that the site and content-base and theme components are bi-lingual? Yeah, the wide foray of i18n modules took care of that.

One huge reason we love Drupal is because of its striving community which drives to make it better and more powerful every day. We leveraged open-sourced modules that the community has produced over the years to satisfy project requirements that Drupal does not provide out-of-the-box.

For starters, we based our project on the Panopoly distribution of Drupal which bundles a wide selection of modules that gave us great flexibility in structuring our pages and saving us precious time in site-building and theming. We leveraged a lot of modules to solve more specialized problems. For example, we used the Workbench suite of modules to take care of the implementation of the review-publish-reject workflow that was essential to maintain the site's integrity and quality. We also used the ZURB Foundation starter theme as the foundation for our site pages.

What vanilla Drupal and the community modules cannot provide us we wrote ourselves, thanks to Drupal's uber-powerful "plug-and-play" architecture which easily allowed us to write custom modules to tell Drupal exactly what we need it to do. The amount of work that can be accomplished by the architecture's hook system is phenomenal, and it elevates Drupal from being just a content management system to a content management framework. Whatever your problem, there most probably is a Drupal module for it.

Flexible indexing and searching with Elasticsearch

A large aspect to our project is that the content we handle should be subject to a search tool available on the site. The criterias for searching do not only demand the support for full-text searches, but also filtering by date-range, categorizations ("taxonomies" in Drupal), and most importantly, geo-location queries and sorting by distance (e.g., within n miles from a given location, etc.) It was readily apparent that SQL LIKE expressions or full-text search queries with the MyISAM engine for MySQL just wouldn't cut it.

We needed a full-pledged full-text search engine that also supports geo-spatial operations. And surprise! -- there is a Drupal module for that (A confession: not really a surprise). The Apache Solr Search modules readily provide us the ability to index all our content straight from Drupal and into Apache Solr, an open-source search platform built on top of the famous Apache Lucene engine.

Despite the comfort that the module provided, I evaluated other options which eventually led us to Elasticsearch, which we ended up using over Solr.

Elasticsearch advertises itself as:

“a powerful open source search and analytics engine that makes data easy to explore”

...and we really found this to be true. Since it is basically a wrapper around Lucene and exposing its features through a RESTful API, it is readily available to any apps no matter which language it is written in. Given the wide proliferation and usage of REST APIs in web development, it puts a familiar face on a not-so-common technology. As long as you speak HTTP, the lingua franca of the Web, you are in business.

Writing/indexing documents into Elasticsearch is straight-forward: represent your content as a JSON object and POST it up into the appropriate endpoints. If you wish to retrieve it on its own, simply issue a GET request together with its unique ID which Elasticsearch assigned it and gave back during indexing. Updating it is also a PUT request away. Its all RESTful and nice.

Making searches is also done through API calls, too. Here is an example of a query which contains a Lucene-like text search (grouping conditions with parentheses and ANDs and ORs), a negation filter, a basic geo-location filtering, and with results sorted by distance from a given location:


POST /volsearch/toolkit_opportunity/_search HTTP/1.1
Host: localhost:9200
{
  "from":0,
  "size":10,
  "query":{
    "filtered":{
      "filter":{
        "bool":{
          "must":[
            {
              "geo_distance":{
                "distance":"100mi",
                "location.coordinates":{
                  "lat":34.493311,
                  "lon":-117.30288
                }
              }
            }
          ],
          "must_not":[
            {
              "term":{
                "partner":"Mentor Up"
              }
            }
          ]
        }
      },
      "query":{
        "query_string":{
          "fields":[
            "title",
            "body"
          ],
          "query":"hunger AND (financial OR finance)",
          "use_dis_max":true
        }
      }
    }
  },
  "sort":[
    {
      "_geo_distance":{
        "location.coordinates":[
          34.493311,
          -117.30288
        ],
        "order":"asc",
        "unit":"mi",
        "distance_type":"plane"
      }
    }
  ]
}

Queries are written following Elasticsearch's own DSL (domain-specific language) which are in the form of JSON objects. The fact that queries are represented as tree of search specifications in the form of dictionaries (or “associative arrays” in PHP parlance) makes them a lot easier to understand, traverse, and manipulate as needed without the need of third-party query builders that Lucene's query syntax leaves to be desired. It is this syntactic sugar that helped convinced us to use Elasticsearch.

What makes Elasticsearch flexible is that it is at some degree schema-less. It really made it quite quick for us to get started and get things done. We just hand it with documents with no pre-defined schema and it just does it job at trying to guess the field types, inferring from the data we provided. We can specify new text fields and filter against them on-the-go. If you decide to start using richer queries like geo-spatial and date-ranges, then you should explicitly declare fields as having richer types like dates, date-ranges, and geo-points to tell Elasticsearch how to index the data accordingly.

To be clear, Apache Solr also exposes Lucene through a web service. However we think Elasticsearch API design is more modern and much easier to use. Elasticsearch also provides a suite of features that lends it to easier scalability. Visualizing the data is also really nifty with the use of Kibana.

The Search API

Because of the lack of built-in access control in Elasticsearch, we cannot just expose it to third-parties who wish to consume our data. Anyone who can see the Elasticsearch server will invariably have the ability to write and delete content from it. We needed a layer that firewalls our search index away from public. Not only that, it will also have to enforce our own simplified query DSL that the API consumers will use.

This is another aspect that we looked beyond Drupal. Building web services isn't exactly within Drupal's purview, although it can be accomplished with the help of third-party modules. However, our major concern was in regards to the operational cost of involving it in the web service solution in general: we felt that the overhead of Drupal's bootstrap process is just too much for responding to API requests. It would be akin to swatting a fruit fly with a sledge-hammer. We decided to implement all search functionality and the search API itself in a separate application and writing it with Symfony.

More details on how we introduced Symfony into the equation and how we integrated together will be the subject of my next blog post. For now we just like to say that we are happy with our decision to split the project's scope into smaller discrete sub-problems because it allowed us to target each one of them with more focused solutions and expand our horizon.