Espen Hovlandsdal
Written by Espen Hovlandsdal
Published 2014-03-07

Visualizing the most read articles on VG

Article visualizationD3. Behind this name is a pretty neat concept, called Data-Driven Documents. I took a look at the framework last year after seeing a lot of cool demos using it. It’s really flexible, and is not tied to a specific form of presentation – you can use D3 to generate an HTML table from an array of numbers, or use the same data to create an interactive SVG bar chart with smooth transitions and interaction.

After looking through different layout algorithms available in D3, I found the treemap algorithm particularly interesting. I’ve seen it used before in both profiling tools and disk usage analyzers and found them to be very efficient for visualizing the difference between numbers. An idea popped into my head: “Maybe this can be used to visualize which articles are being read the most?”. I decided to give it a try.

Entry image

Fetching the data

The easy way to do this would be to export some data from our analytics systems every once in a while. It felt a little too static, in my mind – I wanted something more dynamic, preferably with real-time data.

We’re using Varnish Cache at VG, so we can’t just parse the webserver logs. However, using a tool called vstatd/VCS (Varnish Custom Statistics), we are able to log hits into different keys, which we can filter on and later sort these keys by the number of hits. This gives us realtime statistics, and also allows us to go a couple of minutes back in time, depending on bucket size and the number of buckets we have set up in our configuration.

Every time someone reads one of our articles, they will hit Varnish, which will assign a hash to the request containing the article ID – lets say ARTICLE-<ID>. We can then filter on keys starting with ARTICLE- and sort the results descending by the number of hits. For every article in the top-list, we fetch some basic article information, such as the title of the article, category and the “lead asset” (main article image, usually). I’ve written a simple node.js application that does these steps and polls for new information every 5 seconds.

Presenting the data

After all the data has been retrieved and is available in a simple JSON array, presenting it using D3 is fairly simple. We group the results by category, then use the enter/exit pattern of D3 to easily add, remove and update nodes. The treemap algorithm automatically calculates x, y, width and height for our nodes, based on the defined size of our treemap:

Conclusion

Having proven to myself that the visualization worked as I had hoped it would, I wanted to wrap it into a dashboard-like prototype that we could put up on a monitor in the office. Basically, there is now a node.js application doing four different tasks:

  • Fetches new lists of the most read articles every few seconds
  • Fetches article information for the top articles
  • Provides a simple data endpoint to retrieve the data we need
  • Serves a static webpage which will serve as our dashboard

The solution is now available for anyone who wants to give it a try.

Taking it further

I wanted to make it a little more interactive, so I added some options for toggling images and article titles on and off, setting the number of articles to show, frequency of updates and the data timeframe to fetch.

What I found was that with a small timeframe (say, 10 seconds), the data was very dynamic. However, it might not have enough data to really represent the full picture. With a timeframe of 30 seconds, we can get a clearer picture of what is going on. If you set it to update every 10 seconds, you still get a moving window which is fairly dynamic yet still more statistically correct.

Take a look at the current prototype at mestlest.vg.no. It was a fun project to make, and I will definitely be using D3 more in the future! Hope you like the prototype 🙂

Article visualization

Written by Espen Hovlandsdal
Published 2014-03-07