Visualising Facebook's Population Data on the Web

Over the past month or so (on and off!) I've had the opportunity to explore two really interesting ideas. The first was relating to a talk I saw at FOSS4G in Bonn this year by Fabian Schindler. Fabian's talk was on how to use GeoTIFF data in the web using a set of libraries called geotiff.js and plotty.js. You can see a post about his talk here. The talk got me thinking and interesting in what sort of GeoTIFF data could be pulled into the web.

Around the same time, a news article caught my eye about Facebook having teamed up with a research institute to create a global dataset of population densities. I later managed to find the data on the International Earth Science Information Network (CIESIN) website here. From here I explored ways of making use of the datasets in the web as per the talk I'd seen at FOSS4G.

The Problems #

Upon examining the raw data I noticed some immediate problems with my plan:

The data is relatively large for the web - a country could be upward of 100mb
The data has multiple bands that may not all be useful to end users
WebGL isn't great with large textures

To take on all these problems it was necessary to preprocess the data into a more manageable and usable format.

Preprocessing the Data #

To solve the listed problems was a relatively tricky endeavour. There was a multistep process:

Extact a single band
Downsample the raster (reduce it's resolution)
Split the raster into managable chunks
Allow the browser to know where these chunks are somehow

For all of these processes I made use of Python and GDAL. The band extraction was fairly straight forward process, but the downsampling and splitting were somewhat more complicated issues.

You can see the full solutions to the downsampling and band extraction problems on the GitHub repo. Splitting the data was probably the hardest problem to solve as I struggled to find any examples of this being done across the web that weren't making call outs to the shell from within Python (something I wanted to avoid).

In order to correctly split data it was necessary to subdivide the raster into a given size grid. For this to work correctly we needed to get the top left and bottom right coordinates of all the grid cells. After some thought on solving this mental puzzle, I deduced that you can create an arbitrary (n by n) sized grid of such coordinates using the following function:

def create_tiles(minx, miny, maxx, maxy, n):

    width = maxx - minx
    height = maxy - miny

    matrix = []

    for j in range(n, 0, -1):
        for i in range(0, n):

            ulx = minx + (width/n) * i
            uly = miny + (height/n) * j

            lrx = minx + (width/n) * (i + 1)
            lry = miny + (height/n) * (j - 1)
            matrix.append([[ulx, uly], [lrx, lry]])

    return matrix

Splitting the tiles allows us to send the raster in chunks whilst avoiding using a tile server or any kind of dynamic backend. I created a JSON file that contained metadata for all the necessary resulting files, allowing us to determine their centroid and file location prior to requesting all of them.

Displaying the Data #

Displaying the data on the frontend took a little bit of trial and error. I used a combination of OpenLayers 3, plotty.js and geotiff.js to accomplish the end result. geotiff.js allows us to read the GeoTIFF data, and plotty.js allows us to create a canvas element that can be used by OpenLayers 3 to correctly place the elements.

To request and handle the asynchronous loading of the data I used the Fetch API and Promises (I've included polyfills for both in the demo). Once all the promises have resolved we now have all the tiffs loaded into memory. From here we can use a select dropdown that allows us to change the colors used for presenting the data.

The end result looks a little something like this:

Pros and Cons of this Method #

Pros

We got to a point where we can display the data in the web
The data can be restyled dynamically clientside
No dynamic backend or file server required, just static files after preprocessing

Cons

Tiles are actually less size efficient than one big file, but are necessary to get the data to display
The downsampling factors have to be quite large to get it to be a reasonable file size
Even tiled, file sizes are quite large (i.e. 9 tiles at 2mb a file == 18mb which is a lot for the web)

One interesting idea about having the data client side as opposed to a raw image is you could even go as far as to figure out how to do basic visual analytics on the client. For example you could find a way to display only the highest values or lowest values for a given GeoTIFF.

Have a Go #

The repo is located at : https://www.github.com/JamesLMilner/facebook-data-viz

There are instructions in the README for usage.

Published 23 Dec 2016