Two epic data contests

Two major and exciting data analysis contests were announced (relatively) recently and I thought I'd point our visitors to them:
  • The citations and papers megadatabase Mendeley has announced The Mendeley API Binary Battle, ending on August 31st of this year. The competition is open-form and basically just seeks to find someone who'll create an interesting and popular app that does something to "make science more open." Entries are judged (by, among other people, Tim O'Reilly and the CTOs of Amazon and Thomson-Reuters) on a combination of usage statistics, "viralness", "making science more open", and "coolness." The prize is $10,001, though I should point out that the runner up gets a Quadricopter.
  • The Heritage Health Prize launches April 4th and seeks to "develop a breakthrough algorithm that uses available patient data, including health records and claims data, to predict and prevent unnecessary hospitalizations." The contest uses high-quality anonymized actual patient data and is expected to run for about two years. The prize is $3 million.
The Netflix Prize is probably the most famous example of a data analysis competition, and apparently resulted in a rather large increase in the efficacy of Netflix's movie preference prediction algorithms. IBM Systems Magazine's blog has a rundown of data prizes in general here.

Personal accounts of climate-related conflict

MediaStorm produced for this nice video story for Yale's Environment 360 (there is a written discussion here as well).

For people like me who study climate-induced conflict with statistics and mathematical models, it's important to stay in touch with these personal stories of the people who suffer through these events.

As temperatures rise and water supplies dry up, semi-nomadic tribes along the Kenyan-Ethiopian border increasingly are coming into conflict with each other. When the Water Ends focuses on how worsening drought will pit groups and nations against one another. See the project at http://mediastorm.com/clients/when-the-water-ends-for-yale360


Supplying Africa with climate data to fight disease

Climatic suitability for
malaria transmission.
See more here
In a Comment to Nature this week, M.C. Thomson et al. make the case that climatological information in Africa is under-supplied to decision-makers, especially in the management of public-health (read it here). They suggest that adequate climate information is in short supply because it is a public good, and the value it will generates is through the improvement of other public goods (such as sanitation).

"Climate information is not readily available, so is rarely incorporated into development decisions. At the same time, few public-health institutions or practitioners are equipped to understand or manage the effects of a changing climate, despite major advances in recent years in alerting the health community to its risks.
A dramatic improvement is needed in the availability of relevant and reliable climate data and services, particularly in Africa, where vulnerability to climate is so high. Information — such as historical observations of temperature, ten-day satellite estimates of rainfall, the predicted start date of the rainy season or the likelihood of extreme temperatures in the coming season — should inform the management of all diseases sensitive to climate. These include: malaria, leishmaniasis, acute respiratory infections, intestinal helminths and diarrhoeal diseases. This information could also contribute to food security by providing, for example, early warning for agricultural and livestock pests and diseases.
The following must be put in place within the next decade: new partnerships between the public-health community and national meteorological agencies, space agencies and researchers; a governance structure that ensures data sharing between public and private agencies; a funding model that builds open-access climate databases; climate scientists focused on the delivery of quality products, tailored to user needs; health professionals trained to demand and use climate information; and evidence of the value of all this, relative to alternative investments in health."
The authors use the example of Kericho, Kenya to make their point clear:
(Blue shading is the number of malaria cases per month)

"That it took a decade to establish a robust analysis of climate trends in Kericho, a focus of so much controversy, points to a broader disconnect between those who need climate information and those who produce it. In the 1980s, African meteorological agencies were encouraged to sell their data to raise revenue to maintain their networks of meteorological stations. The agencies' services have understandably prioritized their primary client, the airline industry. Access for non-commercial purposes, including for malaria research, has been constrained by poor collaboration and high data fees, among other factors. 
Instead, funding models are needed that recognize climate data as a resource for development — a classic public good that increases in value the more times the data are used." 


ICPSR Social Science Data Repository

In a recent visit to the library (looking for old Haitian census data) a librarian helped me use the Inter-University Consortium for Political and Social Research data repository.  This group collects and curates a huge database of datasets and facilitates resource exchanges across universities (I used it to obtain hard copies of old censuses from Olin Library at Cornell). A link has been added to the Meta-Resources Page.


Nature Climate Change Publishes first articles

Nature Climate Change is a new, high-profile and interdisciplinary journal focused on climate change research.  Its first research articles were released last week, abstracts below.

David B. Lobell, Marianne Bänziger, Cosmos Magorokosho & Bindiganavile Vivek

New approaches are needed to accelerate understanding of climate impacts on crop yields, particularly in tropical regions. Past studies have relied mainly on crop-simulation models or statistical analyses based on reported harvest data3, 4, each with considerable uncertainties and limited applicability to tropical systems. However, a wealth of historical crop-trial data exists in the tropics that has been previously untapped for climate research. Using a data set of more than 20,000 historical maize trials in Africa, combined with daily weather data, we show a nonlinear relationship between warming and yields. Each degree day spent above 30 °C reduced the final yield by 1% under optimal rain-fed conditions, and by 1.7% under drought conditions. These results are consistent with studies of temperate maize germplasm in other regions, and indicate the key role of moisture in maize’s ability to cope with heat. Roughly 65% of present maize-growing areas in Africa would experience yield losses for 1 °C of warming under optimal rain-fed management, with 100% of areas harmed by warming under drought conditions. The results indicate that data generated by international networks of crop experimenters represent a potential boon to research aimed at quantifying climate impacts and prioritizing adaptation responses, especially in regions such as Africa that are typically thought to be data-poor.

A. Spence, W. Poortinga, C. Butler & N. F. Pidgeon

One of the reasons that people may not take action to mitigate climate change is that they lack first-hand experience of its potential consequences. From this perspective, individuals who have direct experience of phenomena that may be linked to climate change would be more likely to be concerned by the issue and thus more inclined to undertake sustainable behaviours. So far, the evidence available to test this hypothesis is limited, and in part contradictory. Here we use national survey data collected from 1,822 individuals across the UK in 2010, to examine the links between direct flooding experience, perceptions of climate change and preparedness to reduce energy use. We show that those who report experience of flooding express more concern over climate change, see it as less uncertain and feel more confident that their actions will have an effect on climate change. Importantly, these perceptual differences also translate into a greater willingness to save energy to mitigate climate change. Highlighting links between local weather events and climate change is therefore likely to be a useful strategy for increasing concern and action.


Social Conflict in Africa Database

The University of Texas at Austin has put together a new dataset to support conflict research in Africa. Some nice features of this dataset are that they code many types of conflict, not just civil wars, and that the conflicts have "issues" associated with them (eg. "democracy" or "environmental degradation"). The database is searchable and downloadable, and the website has some nice discussions of ongoing research projects.
The Social Conflict in Africa Database (SCAD) is designed to provide users with a comprehensive, methodologically rigorous resource for analyzing social conflict events across the African continent, including all countries with a population of more than 1 million.  It compiles events reported by the Associated Press and Agence France Presse from 1990-2009.  SCAD is designed for use by academic researchers, as well as by journalists, non-governmental organizations, policy makers, and others interested in African politics.

Each record in SCAD refers to a unique social conflict event.  To define an event, the researchers determined the principal actor(s) involved, the target(s), as well as the issues at stake.  Events can last a single day or several months.  A conflict is coded as a single event if the actors, targets, and issues are the same and if there is a distinct, continuous series of actions over time.
Some interesting plots from the website:

Sustainable Development Articles

Our colleague James Rising points us to his newly coded up creation Sustainable Development Articles:
I put together a site for collecting Sustainable Development papers and articles! This is for all of us who wish there were a better way to find, share, and keep track of good SD reading material.

If you use Mendeley [previously blogged on FE here] for organizing your papers, you can drag-and-drop papers into the site by joining the Mendeley group "Sustainable Development to sd.existencia.org Bridge" and putting the papers there. With a delay (~ 1 hr), they'll show up on the site!
There's already a bunch of great stuff up there. Go take a look.


Welcome High School for Environmental Studies students!

Greetings and welcome to students from the High School for Environmental Studies!

Sol and I have gotten wind that you're going to be our online guests leading up to our talk on April 5th. We're very happy to have you guys here and hope you're as excited as we are. We've set up this page up so you have an easy and accessible place to find out more about climate change, sustainable development, and our research. In particular you might want to check out any post tagged with the HSES tag, since those are going to be oriented more specifically towards you. Please feel free to check back every now and then for new content and links, and if you have any questions don't hesitate to email either of us (Jesse: jka2110@columbia.edu Sol: smh2137@columbia.edu).

Assorted reference links:
Previous posts on the blog that you might like:
General advice on getting ready for college, research, environmentalism, etc.:

Understanding Nuclear Accidents

As if the direct damages from the Sendai earthquake / tsunami weren't enough, there's now ample concern that one of the nuclear power plants damaged by the quake might melt down. Ed Lyman of the Union of Concerned Scientists' All Things Nuclear blog has an overview of what's going on and how this could make an already horrific disaster worse:
I have done considerable analysis on the safety risks associated with using MOX fuel in light-water reactors. The use of MOX generally increases the consequences of severe accidents in which large amounts of radioactive gas and aerosol are released compared to the same accident in a reactor using non-MOX fuel, because MOX fuel contains greater amounts of plutonium and other actinides, such as americium and curium, which have high radio-toxicities.
Because of this, the number of latent cancer fatalities resulting from an accident could increase by as much as a factor of five for a full core of MOX fuel compared to the same accident with no MOX. Fortunately, as noted above, the fraction of the fuel in this reactor that is MOX is small. Even so, I would estimate this could cause a roughly 10% increase in latent cancer fatalities if there were a severe accident with core melt and containment breach, which has not happened at this point and hopefully will not.
They also have an excellent two page pdf describing how nuclear power plants work and why they can be so unstable.

More broadly, one of our better estimates of the long-term impacts of nuclear fallout was actually done by Doug Almond, one of the SDev Ph.D. program's core faculty. From his 2009 Quarterly Journal of Economics article with Lena Edlund and Mårten Palme:
We use prenatal exposure to Chernobyl fallout in Sweden as a natural experiment inducing variation in cognitive ability. Students born in regions of Sweden with higher fallout performed worse in secondary school, in mathematics in particular. Damage is accentuated within families (i.e., siblings comparison) and among children born to parents with low education. In contrast, we detect no corresponding damage to health outcomes. To the extent that parents responded to the cognitive endowment, we infer that parental investments reinforced the initial Chernobyl damage. From a public health perspective, our findings suggest that cognitive ability is compromised at radiation doses currently considered harmless.
There's accompanying video of the Chernobyl plume's spread here. And not to be terribly alarmist, but western North America is directly downwind from Japan.

Does Daylight Saving Time Save Energy?

While reflecting on my loss of sleep this past weekend, I found this working paper from a few years ago:

Does Daylight Saving Time Save Energy? Evidence from a Natural Experiment in Indiana

Matthew J. Kotchen, Laura E. Grant
NBER Working Paper No. 14429

Abstract: The history of Daylight Saving Time (DST) has been long and controversial. Throughout its implementation during World Wars I and II, the oil embargo of the 1970s, consistent practice today, and recent extensions, the primary rationale for DST has always been to promote energy conservation. Nevertheless, there is surprisingly little evidence that DST actually saves energy. This paper takes advantage of a natural experiment in the state of Indiana to provide the first empirical estimates of DST effects on electricity consumption in the United States since the mid-1970s. Focusing on residential electricity demand, we conduct the first-ever study that uses micro-data on households to estimate an overall DST effect. The dataset consists of more than 7 million observations on monthly billing data for the vast majority of households in southern Indiana for three years. Our main finding is that -- contrary to the policy's intent -- DST increases residential electricity demand. Estimates of the overall increase are approximately 1 percent, but we find that the effect is not constant throughout the DST period. DST causes the greatest increase in electricity consumption in the fall, when estimates range between 2 and 4 percent. These findings are consistent with simulation results that point to a tradeoff between reducing demand for lighting and increasing demand for heating and cooling. We estimate a cost of increased electricity bills to Indiana households of $9 million per year. We also estimate social costs of increased pollution emissions that range from $1.7 to $5.5 million per year. Finally, we argue that the effect is likely to be even stronger in other regions of the United States.


Abhijit Banerjee and Esther Duflo have put together what promises to be an excellent new book collecting the array of new advances made in development economics: Poor Economics.

The book isn't out yet, but its website sets a new standard for how academics present their work.  It has interactive maps, datasets and a collection of quality studies (each chapter has a long list of relevant reading).

Check it out: pooreconomics.com


Stata blog post on understanding matrices (with bonus Stata cheat sheet)

William Gould on Stata's blog (previously mentioned here) has two great posts (here and here) on the intuition behind matrices and regression coefficients. The section on near-singular matrices is characteristically nice:
Singular matrices are an extreme case of nearly singular matrices, which are the bane of my existence here at StataCorp. Here is what it means for a matrix to be nearly singular: [see figure]
Nearly singular matrices result in spaces that are heavily but not fully compressed. In nearly singular matrices, the mapping from x to y is still one-to-one, but x‘s that are far away from each other can end up having nearly equal y values. Nearly singular matrices cause finite-precision computers difficulty. Calculating y = Ax is easy enough, but to calculate the reverse transform x = A-1y means taking small differences and blowing them back up, which can be a numeric disaster in the making.
Both posts are great and I recommend them for anyone struggling with the intuition behind what exactly you're doing when you type in reg y x.

As an added bonus, earlier this week I stumbled across Kenneth Simon's excellent pdf cheat sheet of Stata commands for intermediate / advanced econometrics, here. I was trying to figure out a way to do something cute with distributed lag models and post-estimation tests, but the sheet covers everything from the simple but important (e.g., the difference between gen old = age >= 18 and gen old = age >= 18 if age<. ) to the arcane but potentially important (e.g., nonlinear hypothesis testing). If you're in applied work and use Stata I highly recommend flipping through it. I've already found several useful techniques I wasn't even aware existed.

March 11 Pacific Tsunami

NOAA has put together a nice technical site with model output and gauge readings from the tsunami on Friday.  The EVL used the data to make the graphic on the top, the NYT put together the graphic on the bottom:

A team at the NYTimes has also put together a nice interactive explanation about how tsunamis are generated and an interactive map with images of the damage.


Maps + Data Presentation

Since Sol recently posted on both NOAA's lovely data visualization labs and an assortment of map resources, I figured it might be worth pointing our readers to Cartastrophe. From the site's about page:
There are a lot of bad maps out there. They lurk in brochures, on company websites, and in magazines. They confuse, they miscommunicate, and they make it hard to learn anything about the world. Sometimes they leave off Sicily. They’re made by people who have to rush against tight deadlines, by people who are pressured by their bosses or clients to make bad design choices because it “looks cool,” and by people who were thrust into map-making jobs without any training.

We learn a lot from seeing what went wrong in someone else’s experience. I hope to amuse, but also to educate — to help people (myself included) understand what the elements of a good map are. And maybe, just maybe, if people are better able to understand what makes up a bad map, they’ll start demanding better ones.

The site is effectively the cartographic counterpart to Andrew Gelman's consistently superb data presentation critiques, and offers a huge amount of useful and thoughtful advice. Highlights:
The last link is particularly nice for its reminder that as much as many of us love Google maps (I have difficulty imagining how I went through life before it...) it can still contain major errors.


Earth Magazine Article

A recent article of mine came out in Earth Magazine.  Its an abridged description of my PNAS paper, which I described in an earlier post.

The best part is that on their main site


they have a voluntary "visitors poll" (I mentioned a different poll in an earlier post) asking visitors if high temperatures affects their productivity. The results at the time of this posting are:

For those interested in the large (but sometimes statistically underpowered) literature on thermal stress and human productivity, here is a review from the Indoor Air Quality Handbook (2001).