17 Nov The net reveals the ties that bind
BBC News, UK, By Bill Thompson, November 17, 2008
One of the throwaway remarks I sometimes make at conferences is that “Google knows you’re pregnant before you do”. I can say this because the things you search for will change as your life changes, and search engine providers may well be able to spot the significance of these changes because they aggregate data from millions of people. Now Google’s philanthropic arm, google.org, has shown just what it can do with the data it gathers from us all by offering to predict where ‘flu outbreaks will take place in the USA. It has found that “certain search terms are good indicators of flu activity”, in that they correlate well with reports from the official Centers for Disease Control and Prevention. And it claims that “across each of the nine surveillance regions of the United States, we were able to accurately estimate current flu levels one to two weeks faster than published CDC reports”, a result that could save people’s lives by alerting them to have ‘flu vaccinations earlier than they might otherwise have done. This is a really interesting piece of work and clearly demonstrates the power of data mining. Its potential usefulness is not limited to health matters. As John Naughton pointed out in The Observer, “everyone I know in business has known for months that the UK is in recession, but it’s only lately that the authorities have been in a position to confirm that – because the official data always lag the current reality.” Perhaps the answer lies buried somewhere in the queries being made online, with company directors or politicians searching for terms that imply a coming recession, like details of redundancy pay or bankruptcy protection. It isn’t only Google who can do this of course. Its database of queries is vast and fast-growing, but it is only one among many databases that underpin the modern world. The banking system is really only a collection of collected databases recording who has which assets, while neither government nor business could operate without complex data stores. Soon the national ID register will store details of everyone in the UK, while the forthcoming Communications Data Bill is likely to include proposals to create a vast system that will record details of every e-mail sent, every website surfed and every file downloaded. As we have seen with flu trends, sometimes the “interesting” knowledge that can be extracted is well-concealed until comparisons can be made with other sources, as it was the correlation between some search terms and the real-world data that mattered. Of course Google has not revealed which search terms it analysed because doing so would undermine the model’s effectiveness. Unfortunately it is being equally reticent about how it has ensured that the data its uses is properly anonymised so that users cannot be identified on the basis of their queries. A letter from the Electronic Privacy Information Center (EPIC) and Patient Privacy Rights to Google boss Eric Schmidt has not been answered, leaving those concerned with online privacy uncertain over the broader implications of the project.