100x Performance Gains with BigQuery Geography Clustering
The BigQuery team introduced support for the geography type some time ago, and since then have not stopped enriching and adding to the functions and performance it offers. BigQuery allows users to run complex geo-spatial analytics directly – harnessing all its power, simplicity, and reliability.
In November 2019, Google announced a significant performance development for GIS-related queries, in that you can now cluster tables using a geography column. This is game-changing for those working heavily with geo-data. By clustering your table using a geography column, BigQuery reduces the amount of data that it needs to read in order to service the query. This makes queries cheaper and run faster when filtering on clustered columns.
Let’s evaluate the benefits of table clustering using a geography column. So that you can test this yourself, let’s use one of those great public datasets curated by Felipe Hoffa, the BigQuery Google Developer Advocate. We’ll use the weather_gsod
dataset – and specifically the two tables named all
and all_geoclustered
. In order for us to test the geography column clustering gains, we need to filter the data based on a geographic property of the data.
For the sake of example, let’s imagine that we want to find the min- and max-temperature values for weather stations located in London. We’ll use our handy polygon tracing tool, Gismap, to draw a rough outline of the city and return the values in WKT (Well-known Text) to use directly in our query.
Putting it together, our query will look like this:
/* specify what to return */ SELECT name, -- name of weather station MIN(temp) AS min_temp, -- min temperature over all time MAX(temp) AS max_temp -- max temperature over all time /* the un-clustered noaa weather measurements table */ FROM `fh-bigquery.weather_gsod.all` /* our geography condition, ie. where the station locations fall within our polygon */ WHERE st_within(point_gis, st_geogfromtext('POLYGON((-0.5191087546253357 51.69057940987663,-0.5850267233753357 51.530249275252345,-0.5685472311878357 51.33504597185959,-0.19226549290658568 51.2285324250183,0.20049573756216432 51.271511154557565,0.3185987649059143 51.55074848767514,0.2856397805309143 51.64628942375177,0.04943372584341432 51.72461916884517,-0.5191087546253357 51.69057940987663))')) /* aggregate */ GROUP BY 1
The results – in degrees F – are as follows:
name | min_temp | max_temp |
ST JAMES PARK | 26.3 | 83.2 |
KENLEY AIRFIELD | 18 | 82.8 |
HEATHROW | 18.6 | 83.4 |
CITY | 22.3 | 92.2 |
BLACKWALL | 37 | 62.2 |
NORTHOLT | 18.4 | 84.1 |
BIGGIN HILL | 15.5 | 90.3 |
PURLEY OAKS | 23 | 81.7 |
LEAVESDEN | 22.3 | 89.1 |
LONDON WEA CENTER | 20 | 85.2 |
KEW-IN-LONDON | 23.3 | 82.4 |
This query reads 9.02GB of data, which is a lot of data to search for something that is geographically very narrow (London vs. the world), so there’s a lot of wastage here.
Now let’s see how the same query performs on the clustered table:
/* specify what to return */ SELECT name, MIN(temp) AS min_temp, MAX(temp) AS max_temp /* the clustered noaa weather measurements table */ FROM `fh-bigquery.weather_gsod.all_geoclusterd` /* our geography condition */ WHERE st_within(point_gis, st_geogfromtext('POLYGON((-0.5191087546253357 51.69057940987663,-0.5850267233753357 51.530249275252345,-0.5685472311878357 51.33504597185959,-0.19226549290658568 51.2285324250183,0.20049573756216432 51.271511154557565,0.3185987649059143 51.55074848767514,0.2856397805309143 51.64628942375177,0.04943372584341432 51.72461916884517,-0.5191087546253357 51.69057940987663))')) /* aggregate query */ GROUP BY 1
The result is obviously the same but, this time, BigQuery reads just 98.97MB of the data – so switching to the clustered table made this same query almost 100x more efficient, and cheaper for you to use. Want to try other locations and test the differences for yourself? Use our GisMap tool to quickly draw and export polygons in WKT or GeoJson formats.
Related Stories
Northern Cities More Adversely Affected by Restrictions than in The South
Mobility levels in the big northern cities failed to recover to their pre-pandemic levels following lockdown in the summer, in contrast to their southern counterparts new data shows.
Measuring Covid-19 Economic Resilience in Towns and Cities | PWC
PWC and Demos use footfall data from Huq Industries to measure the cities that have been most resilient to the economic impacts of Covid-19. With government investing in the safe re-opening of the…
European Professionals Return to Offices Ahead of UK Workers (Again)
EU workers are returning to their places of work after the Christmas break ahead of UK counterparts, with data from Huq Industries putting EU workers 7% ahead in the 20 days since the year began.…
In-Person UK Supermarket Trips Halve While Online Sales Double
Footfall across the UK’s biggest supermarkets has fallen by 51 percent in the last year despite their essential retailer status and booming food sales.
Brexit Effect on Transit Through UK Ports | Financial Times
On page one of todays Financial Times, data from Huq Industries offers an early indication as to how transit through UK ports has been affected since December 31st.
Oxford St Footfall Trends Shows Recovery Getting Faster
Foot traffic on London’s Oxford St paints an encouraging picture of what may come as presence touched pre-Covid levels in the brief interlude between lockdowns ahead of Christmas.
So Far in 2021: UK Ports Transits Down, Delays Increase
In the nine days since the Brexit transition period ended, high-frequency data from Huq Industries shows journeys through UK ports down 26% on January 2020.
Christmas Shopping Footfall on National Highstreets: The Data
With the festive season behind us and the new year begun, Huq looks back at high-street footfall during the Christmas shopping period between December 15-24th for Oxford Street, London and the Grand…
New Data Reveals Elf Industrial Productivity Soared 243pts in December
Elf presence across Christmas workshops in Lapland has reached a new high today, with The Elf Index showing a significant rise.