image
Blog Post • drupal

Use Google Analytics Instead of the Statistics Module

March 17, 2010by Kristen Dyrr 3 min read
Blog Post • drupal
Use Google Analytics Instead of the Statistics Module
Back to top

I recently created a module that uses the Google Analytics API to capture the top ten nodes of various content types by day, week, and all time. This is a great option for any site that needs to use caching, and can’t use the Statistics module.

The module depends on the google_analytics_api module, which makes the job of capturing all the data extremely easy with the google_analytics_api_report_data() function. Here is some easy example code for building a report:


<?php
    
if (!$start_date) {
      
$start_date date('Y-m-d');
    }
    if (!
$end_date) {
      
$end_date date('Y-m-d');  // H:i:s // can't include time... if before noon, include previous day
    
}
    
$dimensions = array('pagePath');
    
$metrics = array('visits');
    
$sort_metric = array('-visits');
    
$filter 'pagePath =@ /blog/ || pagePath =@ /article/';
    
$start_index 1;
    
$max_results 20;

    

// Construct request array.
    
$request = array(
      
'#dimensions' => $dimensions,
      
'#metrics' => $metrics,
      
'#sort_metric' => $sort_metric,
      
'#filter' => $filter,
      
'#start_date' => $start_date,
      
'#end_date' => $end_date,
      
'#start_index' => $start_index,
      
'#max_results' => $max_results,
    );
    try {
      
$entries google_analytics_api_report_data($request);
    }
    catch (
Exception $e) {
      return 
$e->getMessage();
    }
?>

By default, today’s date is used for both the start and end date, to give today’s top content. GA requires both a start and end date, so to get all-time results, you will need to set the start date to the date you first started using GA with your site.

To get the top content, sorted by most popular to least popular, the dimensions variable needs to be set to “pagePath,” with a “visits” metric (for unique page views). or a "pageviews" metric (for all views). The sort_metric variable is set to “-visits” (or "-pageviews") to sort from most visits to least (note the “-” prefix, which tells Google Analytics to sort our results in reverse order).

Since I want to grab blogs and articles only, I have set the filter to match only paths that contain “/blog/” or “/article/”. Unfortunately, this is the only way to filter your node types, so it’s a good idea to use pathauto to ensure all node types have a specific path, and write some code that prevents any other node types from having the path you are targeting.

In my case, there were also specific CCK fields I needed to use in order to filter out additional nodes. If you know that this is going to happen ahead of time, you can always inject something in the path for nodes that have the CCK fields you would like to filter out, and filter them out when retrieving the report. Otherwise, you will have to do what I did, which was to retrieve more results than are needed in the final report (note that $max_results is set to 20, even though this will eventually be a top ten list), and filter the out the excess with a database query, then unset the remaining excess.

One other catch with using Google Analytics in place of Statistics is that it does not work well with cron. You can get it to run through cron when running cron.php manually, but I couldn't find a way to get it to work automatically, even using various spoofing methods. The method will finish without errors, but GA will not return any data.

Cache variables can save the day here! We can modify the code above with the following:


<?php
  
if ($cache cache_get('ga_stats''cache_content')) {
    
$stats $cache->data;
  }
  else {
    
//GA code from above goes here
    
if (!empty($entries)) {
       foreach (
$entries as $entry) {
          
$metrics $entry->getMetrics();
          
$stats['visits'] = $metrics['visits'];
          
//grab any other data you want here
       
}
    }
    if (!empty(
$stats)) {
      
cache_set('ga_stats'$stats'cache_content'CACHE_TEMPORARY);
    }
  }
?>

Just replace ga_stats with the name you want for your variable above. In fact, you can create variables for multiple individual pages as well, if you really want to study all the stats for specific pages. You may also want to replace cache_content with a different cache object, such as a custom one created in your own module.

This is only the beginning of what you can do with Google Analytics. If you plan your pages and URLs well, you can capture almost any data you want, even link clicks and page exits. The google_analytics_api module provides plenty of options, and the report API itself offers a plethora of options.

Here is the main developer page to learn about your report options:
http://code.google.com/apis/analytics/docs/gdata/gdataDeveloperGuide.html 2015-06-24 Update: Documentation for the Google Analytics Reporting API has moved here: https://developers.google.com/analytics/devguides/reporting/

I also found this page really handy:
http://code.google.com/apis/analytics/docs/gdata/gdataReferenceDataFeed...2015-06-24 Update: Reference guide has moved here: https://developers.google.com/analytics/devguides/reporting/core/v3/reference
Pay special attention to the filters section.

And here is a link to the google_analytics_api module:
http://drupal.org/project/google_analytics_api

Authored by