Google search data for forensic epidemiology

Two recent papers have come out demonstrating that Google searches can provide extraordinarily rich data for forensic epidemiologists:

Web-scale pharmacovigilance: listening to signals from the crowd 
Ryen W White, Nicholas P Tatonetti, Nigam H Shah, Russ B Altman, and Eric Horvitz  
Adverse drug events cause substantial morbidity and mortality and are often discovered after a drug comes to market. We hypothesized that Internet users may provide early clues about adverse drug events via their online information-seeking. We conducted a large-scale study of Web search log data gathered during 2010. We pay particular attention to the specific drug pairing of paroxetine and pravastatin, whose interaction was reported to cause hyperglycemia after the time period of the online logs used in the analysis. We also examine sets of drug pairs known to be associated with hyperglycemia and those not associated with hyperglycemia. We find that anonymized signals on drug interactions can be mined from search logs. Compared to analyses of other sources such as electronic health records (EHR), logs are inexpensive to collect and mine. The results demonstrate that logs of the search activities of populations of computer users can contribute to drug safety surveillance. 

Seasonality in Seeking Mental Health Information on Google  
John W. Ayers, Benjamin M. Althouse, Jon-Patrick Allem, J. Niels Rosenquist, and Daniel E. Ford 
Methods: All Google mental health queries were monitored in the U.S. and Australia from 2006 to 2010. Additionally, queries were subdivided among those including the terms ADHD (attention defıcit-hyperactivity disorder); anxiety; bipolar; depression; anorexia or bulimia (eating disorders); OCD (obsessive-compulsive disorder);schizophrenia; andsuicide. A wavelet phase analysis was used to isolate seasonal components in the trends, and based on this model, the mean search volume in winter was compared with that in summer, as performed in 2012.
Results: All mental health queries followed seasonal patterns with winter peaks and summer troughs amounting to a 14% (95% CI 11%, 16%) difference in volume for the U.S. and 11% (95% CI 7%, 15%) for Australia. These patterns also were evident for all specifıc subcategories of illness or problem. For instance, seasonal differences ranged from 7% (95% CI 5%, 10%) for anxiety (followed by OCD, bipolar, depression, suicide, ADHD, schizophrenia) to 37% (95% CI 31%, 44%) for eating disorder queries in the U.S. Several nonclinical motivators for query seasonality (such as media trends or academic interest) were explored and rejected.
Conclusions: Information seeking on Google across all major mental illnesses and/or problems followed seasonal patterns similar to those found for seasonal affective disorder. These are the fırst data published on patterns of seasonality in information seeking encompassing all the major mental illnesses, notable also because they likely would have gone undetected using traditional surveillance. 


  1. I've looked into this before and got concerned about drawing inferences for two reasons:
    1) these patterns do not hold in some european countries. In particular, Germany fit badly for the analyses I tried.
    2) seasonal variation is the norm in Google searches rather than being rare. Since there's so much seasonal variation in search behaviour in general, it seems like a strong claim to say that seasonality in a particular term reflects something about the related phenomena rather than some still uncertain fact about internet searching.

    Most searches with seasonal patterns are not that easy to explain either.

  2. Hi Jon,

    Very interesting. Do you have any sources on seasonality of Google searches? I'd be interested in learning some more.


  3. http://googleresearch.blogspot.com/2009/08/on-predictability-of-search-trends.html

    This blogpost from the team working on Google Trends says that around half of the top searches in Google can be predicted 12 months ahead, usually indicating that they have a strong seasonal component.

    In my own research I've looked at the relationships between political survey data and various search terms.


    A lot of the searches were strong predictors of survey responses but it was often necessary to seasonally smooth them beforehand. That is, the phenomena, issue salience, did not have a seasonal component while the google searches that closely tracked it did.

    I'm sympathetic to the idea of using search data in epidemiology but I want to see (and still intend to get round to researching) a validation of these searches against other epidemiological data.