Swine Flu Exposes Limits of Google Trends
Swine flu is in. In the rush to cover this latest possible pandemic, newswires are alive with activity, blogs and social networking sites are buzzing, and the CDC and WHO are back in the limelight. This despite the fact that the number of cases are limited (only 40 confirmed infections have occurred in the US).
The rush of news has been accompanied by a rush to track that news. The WSJ, amongst others, has a tracking website, including a map of infections in North America. Best of all, Google has a map showing how the infection is traveling.
This rush was started by Google Flu Trends, a website that tracks flu-related search queries to estimate influenza levels in different US states. Further studies suggested the same approach might work for other diseases as well.
Analyzing Google Trends
So how has Google Trends, the broader application of the Flu Trends concept, performed in the current scenario? A quick analysis shows that Google search results did in fact increase over the past few days (see chart – source: Google Trends).
A quick analysis shows three items worth mentioning:
- First, while Google Trends does show an increase in search activity on “swine flu,” the first uptick in activity only occurred on April 23. By contrast, the first news stories appeared on April 21 when two cases were confirmed in California.
- Second, Google Trends reports that the majority of search queries were from New Zealand, USA, UK, Canada, and Australia. Only a very small minority were from Mexico. Yet, Mexico is the country supposedly at the heart of the pandemic.
Explaining the Discrepencies
I had used a Google Trends like methodology two years ago to track the evolution of climate change as an issue in news coverage. Having worked on that, I can propose a few general reasons that explain why Google Trends is limited in this case.
First, it appears that Google Trends follows with some time lag, actual infections. This should not be surprising, as people are not likely to search for a disease before having had some exposure to it. This does not mean that it is not a useful tool for tracking diseases over the long term. At the very least, the response time of a system based on GT might be lower.
Second, the current scenario shows that Google Trends is highly susceptible to “noise.” Prior to this outbreak, swine flu was probably not a commonly known disease, and queries on it were extremely rare (if not non-existent). Thus, even the slightest uptick in search activity would show up as a major change. That uptick was provided by the highly charged media coverage of the subject. Given this, one wonders if the search results are more “noise” and less people with a genuine interest in the subject. So, Google Trends is likely to be more accurate where general knowledge of a subject (the baseline) is high, and media coverage (noise) is low.
Finally, and most interestingly, why is it that most of the search results came from the US, while Mexico is more exposed to it? Not surprisingly, this methodology only works where both a large number of the population and media are on the internet.
What Next for Google Trends?
When discussing why most search queries occurred in the US, it is worth noting another fact about the swine flu outbreak – that it has traveled extremely fast. Originating in Mexico, it has been carried to the USA, Spain, and New Zealand. This brings into question the validity of using the geographic source of search queries as a reliable indicator of where the disease actually is.
Still, it may also offer a way to enhance Google Trends. What if Google Trends data was combined with travel data on the number of people traveling from a “hotspot” of an infectious disease. It would be logical to assume that popular destinations, or ones which receive travel groups, would be the most likely next locations for further infections. Thus, a map could potentially be created of not only where the disease is generating interest, but where it might be headed.
Of course, Google does not have access to such data – though at some point it may decide to acquire a travel operator. But the general lesson is simply that to make Google Trends more useful, search query data needs to be looked at together with real-world data (such as travel data or hospital records).
It is still early days for the swine flu outbreak, but some commentators are already suggesting the “social web” has actually created hysteria rather than help track the disease. That may be true, but it is hardly a problem of the “social web.” As a reader on the FP pointed out, “Twitter is only a natural extension of a typical neighborhood.”
So, in this “typical neighborhood,” what the swine flu outbreak has done is illustrate where Google Trends does well – in tracking general interest amongst heavy Internet users. But it also exposes limitations – the methodology is (not surprisingly) susceptibility to “noise” from media coverage and is biased towards countries and issues that are online. This does not mean that the idea itself is flawed. Just that it must be taken with a pinch of salt, and that it needs work – especially interfacing it with real-world data streams – to make it really useful.