Divide & Conquer: Drupal 7, Symfony, Elasticsearch, and AngularJS - Part 2

In the [previous blog post]({% post_url 2015-01-21-divide-and-conquer-part-1 %}) we shared how we implemented the first part of our problem in Drupal and how we decided that splitting our project into discrete parts was a good idea. I'll pick up where we left off and discuss why and how we used Symfony to build our web service instead of implementing it in Drupal.

RESTful services with Symfony

Symfony being a web application framework, did not really provide built-in user-facing features that we were able to use immediately, but it gave us development tools and a development framework that expedite the implementation of various functionality for our web service. Even though much of the work was cut out for us, the framework took care of the most common problems and the usual plumbing that goes with building web applications. This enabled us to tackle the web service problem with a more focused approach.

Other than my familiarity and proficiency with the framework, the reason we chose Symfony over other web application frameworks is that there is already a well-established ecosystem of Symfony bundles (akin to modules in Drupal) that are centered around building a RESTful web service: FOSRestBundle provided us with a little framework for defining and implementing our RESTful endpoints, and does all the content-type negotiation and other REST-related plumbing for us. JMSSerializerBundle took care of the complexities of representing our objects into JSON which our clients to consume. We also wrote our own little bundle with which we use Swagger UI to provide a beautiful documentation to our API. Any changes to our code-base that affects the API will automatically update the documentation, thanks to NelmioApiDocBundle in which we contributed the support for generating Swagger-compliant API specifications.

We managed to encapsulate all the complexities behind our search engine within our API: not only do we index content sent over by Drupal, but we also had to index thousands of data that we are pulling from a partner at a daily basis. On top of that, the API also appends search results from another search API provided by one other partner should we run out of data to provide. Our consumers doesn't know this and neither should Drupal -- we let it worry about content management and sending us the data, that's it. In fact,

In fact, Drupal never talks to Elasticsearch directly. It only talks to our API and authenticating itself should it need to write or delete anything. This also means we can deploy the API on another server without Drupal breaking because it can no longer talk to a firewalled search index. This way we keep everything discrete and secure.

In the end, we have an REST API with three endpoints:

a secured endpoint which receives content which are then validated and indexed, which is used by Drupal,
a secured endpoint which is used to delete content from the index, which is also used by Drupal, and finally;
a public endpoint used searching for content that matches the specifications provided via GET parameters, which will be used in Drupal and by other consumers.

Symfony and Drupal 8

Symfony is not just a web application framework, but is also a collection of stand-alone libraries that can be used by themselves. In fact, the next major iteration of Drupal will use Symfony components to modernize its implementations of routing & URL dispatch, request and response handling, templating, data persistence and other internals like organizing the Drupal API. This change will definitely enhance the experience of developing Drupal extensions as well as bring new paradigms to Drupal development, especially with the introduction of services and dependency-injection.

Gluing it together with AngularJS

Given that we have a functioning web service, we then used AngularJS to implement the rich search tools on our Drupal 7 site.

AngularJS is a front-end web framework which we use to create rich web applications straight on the browser with no hard dependencies on specific back-ends. This actually helped us prototype our search tools and the search functionality faster outside of Drupal 7. We made sure that everything we wrote in AngularJS are as self-contained as possible, in which case we can just drop them into Drupal 7 and have them running with almost zero extra work. It was just a matter of putting our custom AngularJS directives and/or mini-applications into a panel, which in turn we put into Drupal pages. We have done this AngularJS-as-Drupal-panels before in other projects and it has been really effective and fun to do.

To complete the integration, it was just a matter of hooking into Drupal's internal mechanism in order to pass along authored content into our indexing API when they are approved, or deleting them when they are unpublished.

Headless Drupal comes to Drupal 8!

The popularity of front-end web frameworks has increased the demand for data to be available via APIs as templating and other display-oriented tasks has rightfully entered into the domain of client-side languages and out of back-end systems. It's exciting to see that Drupal has taken initiative and has made the content it manage available through APIs out-of-the-box. This means it will be easier to build single-page applications on top of content managed in Drupal. It is something that we at ActiveLAMP are actually itching to try.

Also, now that Google has added support for crawling Javascript-driven sites for SEO, I think single-page applications will soon rise from being just "experimental" and become a real choice for content-driven websites.

Using Composer dependencies in Drupal modules

We used the Guzzle HTTP client library in one of our modules to communicate with the API in the background. We pulled the library into our Drupal installation by defining it as a project dependency via the Composer Manager module. It was as simple as putting a bare-minimum composer.json file in the root directory of one of our modules:

{
  "require" : {
    "guzzlehttp/guzzle" : "4.\*"
  }
}

...and running these Drush commands during build:

$ drush composer-rebuild
$ drush composer-manager install

The first command collects all defined dependency information in all composer.json files found in modules, and the second command finally downloads them into Drupal's library directory.

Composer is awesome. Learn more about it here.

How a design deicision saved us from a potentially costly mistake

One hard lesson we learned is that its not ideal to use Elasticsearch as the primary and sole data persistence layer in our API.

During the early stages of developing our web service, we treated Elasticsearch as the sole data store by removing Doctrine2 from our Symfony application and doing away with MySQL completely from our REST API stack. However we still employed the Repository Pattern and wrote classes to store and retrieve from Elasticsearch using the elasticsearch-php library. These classes also hide away the details on how objects are transformed into their JSON representation, and vice-versa. We used the jms-serializer library for the data transformations; its an excellent package that takes care of the complexities behind data serialization from PHP objects to JSON or XML. (We use the same library for delivering objects through our search API which could be a topic for a future blog post.)

This setup worked just fine, until we had to explicitly define date-time fields in our documents. Since we used UNIX timestamps for our date-time fields in the beginning, Elasticsearch mistakenly inferred them to be float fields. The explicit schema conflicted with the inferred schema and we were forced to flush out all existing documents before the update can be applied. This prompted us to use a real data store which we treat as the Single Version of Truth and relegate Elasticsearch as just an index lest we lose real data in the future, which would be a disaster.

Making this change was easy and almost painless, though, thanks to the level of abstraction that the Repository Pattern provides. We just implemented new repository classes with the help of Doctrine which talk to MySQL, and dropped them in places where we used their Elasticsearch counter-part. We then hooked into Doctrine's event system to get our data automatically indexed as they are written in and out of the database:

<?php

use ActiveLAMP\AppBundle\Entity\Indexable;
use ActiveLAMP\AppBundle\Model\ElasticsearchRepository;
use Doctrine\Common\EventSubscriber;
use Doctrine\ORM\Event\LifecycleEventArgs;
use Doctrine\ORM\Events;
use Elasticsearch\Common\Exceptions\Missing404Exception;

class IndexEntities implements EventSubscriber
{

    protected $elastic;

    public function __construct(ElasticsearchRepository $repository)
    {
        $this->elastic = $repository;
    }

    public function getSubscribedEvents()
    {
        return array(
            Events::postPersist,
            Events::postUpdate,
            Events::preRemove,
        );
    }

    public function postPersist(LifecycleEventArgs $args)
    {
        $entity = $args->getEntity();

        if (!$entity instanceof Indexable) {
            return;
        }

        $this->elastic->save($entity);
    }

    public function postUpdate(LifecycleEventArgs $args)
    {
        $entity = $args->getEntity();

        if (!$entity instanceof Indexable) {
            return;
        }

        $this->elastic->save($entity);
    }

    public function preRemove(LifecycleEventArgs $args)
    {
        $entity = $args->getEntity();

        if (!$entity instanceof Indexable) {
            return;
        }

        try {
            $this->elastic->delete($entity);
        } catch (Missing404Exception $e) {
            // Ignore 404 error if entity is not in the search index prior to deletion.
        }
    }
}

Thanks, Repository Pattern!

Overall, I really enjoyed building out the app using the tools we decided to use and I personally like how we put the many parts together. We observed some tenets behind service-oriented architectures by splitting the project into multiple discrete problems and solving each with different technologies. We handed Drupal the problems that Drupal knows best, and used more suitable solutions for the rest.

Another benefit we reaped is that developers within ActiveLAMP can focus in on their own domain of expertise: our Drupal guys take care of Drupal work that non-Drupal guys like me aren't the best fit for, while I can knock out Symfony work which is right up my alley. I think we at ActiveLAMP has seen the value of solving big problems through divide-and-conquer being diversified in the technologies we use.