Using Google Analytics Statistics within DSpace
Thank you to Claire Knowles of Edinburgh University who provides this overview of how they have been able to display statistics from Google Analytics in DSpace.
In 2009 Edinburgh University Digital Library adopted Google Analytics (GA) to track usage statistics within the DSpace Repositories it supports on behalf of the Scottish Digital Library Consortium (SDLC). The GA statistics have proven much more reliable than the existing plugins available for DSpace previously with which we experienced lost statistics and inflated pageviews resulting from robots.
Unfortunately the GA statistics for sites being tracked are only viewable via the GA dashboard for which users require a Google account and managed permissions. This limits the visibility of statistics to a few people at each institution. Prompted by the presentation given by Graham Triggs (then working for BioMed Central) at the Open Repositories Conference 2010, we decided to write some code to make the Google Analytics statistics visible to all users of the DSpace installations.
The work has been broken into phases:
1. Capture of downloads in DSpace by Google Analytics.
The basic GA tracking code within DSpace is unable to capture the number of file downloads as these are not links within pages. To address this we added code to the two downloads on the item page to enable these download actions to be measured. This captured all downloads within Dspace but not those users coming directly from search engines to the download file. To capture these statistics we decided to reroute all users back through the item page. This means that they now have two clicks instead of one to reach the download but it enables us to capture these statistics and also raises the visibility of the Repository to users. To reduce the inconvenience to the users we moved the file downloads links on the item page from the bottom to the top so that they do not have to scroll down to find the download.
2. Adding page views to each item page within DSpace
Secondly, we added the number of page views within the last year to the item page. This was a proof of concept which showed that we could connect to the Google Analytics API and pull back statistics into DSpace. We decided to only include the number of views for the past year to reduce any disparities between the the number of pageviews between older and new items.
3. Making statistics viewable within the DSpace web pages.
We decided to make the GA statistics available at three levels: item, collection and repository as this provides most of the statistics which are requested by users. Using the Query Explorer provided by Google we were able to test and refine our queries before starting development. The pages were developed using the Google Analytics java API, jQuery and the Google Chart tools to draw graphs and maps.
As we complete the rollout of Google Analytics to all the SDLC partners we are starting to look to what other statistics we would like to make available both from Google Analytics and also possible exposing statistical information about DSpace using Google’s chart tools. One statistic that would be of interest to researchers is collating and presenting download figures for authors (rather than by item/collection/community).
We have encountered problems separating the item, collection and community statistics within DSpace as all of their urls are formatted in the same way, we therefore have to query DSpace data to do this and cannot distinguish them using the statistics data alone. If the requested item, file, collection or community is not available in DSpace an error page is returned, these were being recorded in the same way as successful page which has led to invalid items being listed in the statistics top ten tables. To prevent this error pages are now recorded as an error event within Google Analytics.
These changes have given us much greater understanding of how our repository is being used with the majority of users coming directly from Google. The URLrewrite change led to a double of our download statistics as we now capture users who previously went straight to the download.
Thanks to: Scottish Digital Library Consortium, Stuart Wood and Gareth Johnson of University of Leicester for information on the URLrewrite, Graham Triggs formerly of BioMed Central and now Sympletic.
The code to enable GA stats within DSpace is freely available from github: https://github.com/seesmith/Dspace-googleanalytics
You can view our collection and item statistic changes at http://www.era.lib.ed.ac.uk
Graham Trigg’s slides from OR2010: http://www.slideshare.net/OpenRepository/enhancing-statistics-google-analytics-and-visualization-apis