Five Essential Qualities for Geo-data Used in Stocks Prediction

In this post we ask Isambard Poulson, Huq Industries CTO, what are some of the key qualities that customers look for when evaluating sources of geo-data for stocks prediction.

How did you come to learn about using mobile geo-data for predicting stocks performance?

We started engaging with the FS alt-data community back in 2018 when we started a few trials with hedge funds investing in stock market industries they felt our geo-data could help them find a position on. Since then I’ve supported customers through multiple evaluations and as similar questions crop up time and again, we’ve been able to capture those common needs and use them to optimise our product offerings.

How successful have you been in selling mobile geo-data to hedge fund customers?

Data evaluations can take some time to conclude and we have a number of trials pending, but so far we’re seeing an 85% acceptance rate on our data and are pleased to be the only mobile geo-data provider selected for coverage across Europe by one of the world’s largest quant funds so we must be doing something right.

What would you say are the five critical quality indicators for hedge funds looking to buy your data?

I’ll try not to get too deep into the weeds about different requirements and how they vary, but the starting point in any evaluation is how much data do we have.

#1 It must be big

Size is important for our customers as they’re using our data like a research panel. The more users out and about visiting stores, the more there is to see in the data – and in more detail. If you decide that you want to be able to work at the individual store level and to do so you need a few hundred monthly users generating a few thousand devices, at the national level that makes for a very dense panel requirement as a share of population. A smaller panel might mean that you can only reach that mass of data at the county, state or region level – which may be fine for some purposes, but more data gives you more flexibility.

#2 It must be stable

The second issue is that comes up a lot is the question of panel stability – and that is something that mobile geo-data vendors have and continue to struggle with. If your objective is to detect changes in how consumers interact with the commercial outlets in their physical environment – ie measure footfall – then ideally that should be the only variable within the data.

The problem is that mobile geo-data is derived from partnerships with app providers, whose users come and go, and who themselves can come and go. Further, their audiences can change in size with time – and when you add a new app partner (a good thing) you change the size of your panel which affects the trends you will observe (a bad thing). It goes with the territory to a large extent and while customers are aware that this challenge is inherent, the more that we can do to optimise for this through judicious app partner recruitment and monitoring, the less the burden for them.

#3 It must be right

Geo-data isn’t commoditised in the way that some people might imagine that it is. How data is collected from the device, what constitutes a measurement ‘trigger’, the information that is collected and how – if at all – that data is post-processes varies hugely from vendor to vendor. Huq is the only geo-data vendor built from the ground up to offer footfall data to customers for research, rather than as a by-product of some other business activity (like advertising). That puts us in a unique position to go long on certain dataset characteristics and to optimise for our FS customers’ needs.

Central to that is what we call our ‘resolution methodology’ – that is, how we resolve the basic machine data (coordinates, accuracy, WiFi and other) to bricks and mortar stores and outlets with confidence. A false association between a user and a place just dulls the trend, as everybody ends up doing everything and there’s no variation to detect. Not just that, but naive approaches like straight-up geo-fencing cannot take into consideration things like multi-story buildings and malls, which is where a significant proportion of commercial activity takes place – particularly in countries like the US. Reliably tagging visits to the right places is key to quality and sits at the heart of what we do.

#4 It must be rich

Connecting the dataset to existing models and algos – particularly for quant funds who may well be layering tens or hundreds of public and commercial datasets together to obtain the most accurate sense of signal – is an arduous integration task for customers to get to grips with, and that’s before they’ve received any value from the data. Manually mapping Costa to its parent, Whitbread Plc, to their ISIN, SEDOL or FIGI identifier – and then updating that when it gets bought by Coca Cola is an unenviable task for anyone to do and to maintain. By mapping trading brands to their parent companies and tickets with point-in-time accuracy, we can lower the barriers to entry for customers – which in turn allows them to complete their testing and start benefiting from our products faster.

#5 It must be legal

Finally, customers must be able to demonstrate that they have the relevant end-user consents to consume the data and to use it to power their analyses. Geo-data vendors that aggregate data from multiple sources on an S2S basis (ie. collected by others and provided server to server) fail to demonstrate a chain of consent as they are not in control and have no visibility into the circumstances in which it originated.

The same applies to measurement quality, and in the worst case fraudulent data entering the food chain. We only collect data using our highly-optimised SDK from first party apps who obtain the right end-user consent, and we know as we know where to go to check! It pays to do things properly and not to cut corners as data obtained in any other way just ends up being thrown away.