Managing Massive Tables In BigQuery With Wildcards
BigQuery supports the *
wildcard to reference multiple tables or files. You can leverage this feature to load, extract, and query data across multiple sources, destinations, and tables. Let’s see what you can do with wildcards with some examples.
The first thing is definitely loading the data into BigQuery. If you deal with a very large amount of data you will have, most likely, tens of thousands of files coming from data pipelines that you want to load into BigQuery. Using wildcards, you can easily load data from different files into a single table.
bq load project_id:dataset_name.table_name gs://my_data/input/prefix/* ./schema.json
Also, this is not limited to only one prefix but you can specify multiple ones for example:
bq load project_id:dataset_name.table_name gs://my_data/input/prefix_1/* gs://my_data/input/prefix_5/* gs://my_data/input/prefix_25/* ./schema.json
The command above will load all the files matching all the prefixes into the specified table.
Wildcards can be used in the other direction too. Namely, they can be used to export data from BigQuery to GCS. This is very useful especially because BigQuery limits exports to a single file only to tables smaller than 1GB.
bq extract project_id:dataset_name.table_name gs://my_data/extract/prefix/file_prefix_*.json
The previous command will result in multiple files exported into the my_data
bucket within the prefix extract/prefix/
and all file names will be:
file_prefix_000000000000.json file_prefix_000000000001.json file_prefix_000000000002.json ... file_prefix_000000003792.json file_prefix_000000003793.json
The other very useful use of wildcards is evident in queries. In fact, you can reference multiple tables in a single query by using * to match all the tables in a dataset with the same prefix. For example, consider you have a collection of tables called events_US, events_GB, events_IT, … with all the ISO2 code of each country in our dataset. The following query will return the count per day per country of events of type ‘submit’ in our dataset.
select _table_suffix as country, day, count(*) from events_* where type = 'submit' group by 1,2
You can also filter out matched tables using _table_suffix in the where clause. For example, if you are only interested in Germany, France, and Japan just run the following:
select _table_suffix as country, day, count(*) from events_* where type = 'submit' and _table_suffix in ('DE', 'FR', 'JP') group by 1,2
What I personally like the most of using wildcards is that it enables me to design better, simpler, and more generic analytics queries as well as ETL jobs. The aim of this post was to help you improve your code quality and your productivity.
Happy querying with BigQuery 🙂
Related Stories
Northern Cities More Adversely Affected by Restrictions than in The South
Mobility levels in the big northern cities failed to recover to their pre-pandemic levels following lockdown in the summer, in contrast to their southern counterparts new data shows.
Measuring Covid-19 Economic Resilience in Towns and Cities | PWC
PWC and Demos use footfall data from Huq Industries to measure the cities that have been most resilient to the economic impacts of Covid-19. With government investing in the safe re-opening of the…
European Professionals Return to Offices Ahead of UK Workers (Again)
EU workers are returning to their places of work after the Christmas break ahead of UK counterparts, with data from Huq Industries putting EU workers 7% ahead in the 20 days since the year began.…
In-Person UK Supermarket Trips Halve While Online Sales Double
Footfall across the UK’s biggest supermarkets has fallen by 51 percent in the last year despite their essential retailer status and booming food sales.
Brexit Effect on Transit Through UK Ports | Financial Times
On page one of todays Financial Times, data from Huq Industries offers an early indication as to how transit through UK ports has been affected since December 31st.
Oxford St Footfall Trends Shows Recovery Getting Faster
Foot traffic on London’s Oxford St paints an encouraging picture of what may come as presence touched pre-Covid levels in the brief interlude between lockdowns ahead of Christmas.
So Far in 2021: UK Ports Transits Down, Delays Increase
In the nine days since the Brexit transition period ended, high-frequency data from Huq Industries shows journeys through UK ports down 26% on January 2020.
Christmas Shopping Footfall on National Highstreets: The Data
With the festive season behind us and the new year begun, Huq looks back at high-street footfall during the Christmas shopping period between December 15-24th for Oxford Street, London and the Grand…
New Data Reveals Elf Industrial Productivity Soared 243pts in December
Elf presence across Christmas workshops in Lapland has reached a new high today, with The Elf Index showing a significant rise.