Using Geo-data + Carto to Visualise Congestion Roads

Huq’s geo-data product contains a broad range of attributes that describe how consumers move and how they interact with the world around them. Some of these attributes are deterministic – that is to say, unequivocal fact – while others are inferred from deterministic data available to our measurement software. Among the characteristics that Huq has built-in to its dataset is speed, ie. the speed at which consumers were moving at the time that the measurement was taken.

We love using CartoDB to visualise our geo-data, particularly where it comes to showing patterns of movement over time using their handy Torque map functionality. One really compelling showcase we’d like to share with you is how we can illustrate congestion on roads using data taken straight from our dataset – with really stunning effects.

The visualisation requires nothing more than a coordinate pair, a timestamp, our build-in speed value (in kmph) and a way to classify (simplify) the speed data into three buckets – slow, mid and fast. This plays well with Carto’s concept of map styling, which offers colour coding based on classified values up to five-wide (as opposed to say, a vector gradient distribution with multiple stops, which would make it more adaptive).

There is a limit generally of how much data Carto can happily render in the browser, particularly where it comes Torque map animations and blend modes like ‘multiply’ – which are good for pronouncing the spatial density of the data but is expensive to calculate – and can slow things down a bit. From our experience, ~ 250K rows represents a good upper limit.

We therefore snipped a fairly random part of the UK (populated, but not London because that’s too easy, so.. Woking?) using K. Arthur Endsley’s lovely little WKT polyon-maker and wrote a quick query to select the dataset attributes we needed for this task. Also – nice trick – because we’re only interested in an aggregate view of the data over, say a 24 hour period, we replaced the date timestamp component with a hard-coded value but preserved the time component so that we keep the natural distribution of the data over the ‘day’.

-- in BigQuery Standard SQL:

    datetime('2019-01-01', time(locoal_time)) as timestamp, 

from `my_events_table`
where st_contains(st_geogfromtext('<WKT POLYGON>'), st_geogpoint(longitude, latitude))
group by 1, 2, 3, 4

Note the group by clause at the end, which we have used here as a quick means to deduplicate the data based on a hash of all four columns in our section. No point in using up rows from our precious 250K limit unneccessarily. We then classified the output into three natural speed boundaries using the bounds 0-40, 41-80, 81-120 kmph (note, kmph, not mph!) as that would be.. dangerous.

-- after select statement:

        when speed_kph between 0 and 40 then 'slow'
        when speed_kph between 0 and 40 then 'mid'
        when speed_kph between 0 and 40 then 'fast'
        else null
    end as speed_class

PS. remove the null class values* later.

And finally, take a random sampling of those rows and export the table as CSV to load into Carto.

-- after from clause:
    where rand() <= 0.012

NB. That’s a nice little trick to get a random sample of rows without sorting the dataset, as rand() will produce a random float value between 0 and 1 distributed equally between the two ends. Setting the condition to 0.012 is a little like saying ‘give me 1.2% of the data within my [ current ] selection (Woking area), which in our case equates to roughly 250K records.

If we were being really good we would also consider that speed is calculated as the time elapsed between measurements over the distance between the associated points. As we have previously covered in some depth, mobile location cannot be relied on for accuracy, and does tend to slip around a bit – by tens of meters at least. This can introduce noise into our speed inference, but there are strategies available for correcting this – rather than just discarding them as we have done here.