7.29.2013

Forward vs. reverse causal questions

Andrew Gelman has a thought-provoking post on asking "Why?" in statistics:
Consider two broad classes of inferential questions: 
1. Forward causal inference. What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth? 
2. Reverse causal inference. What causes Y? Why do more attractive people earn more money? Why do many poor people vote for Republicans and rich people vote for Democrats? Why did the economy collapse? [...] 
My question here is: How can we incorporate reverse causal questions into a statistical framework that is centered around forward causal inference. (Even methods such as path analysis or structural modeling, which some feel can be used to determine the direction of causality from data, are still ultimately answering forward casual questions of the sort, What happens to y when we change x?) 
My resolution is as follows: Forward causal inference is about estimation; reverse causal inference is about model checking and hypothesis generation.
Among many gems is this:
A key theme in this discussion is the distinction between causal statements and causal questions. When Rubin dismissed reverse causal reasoning as “cocktail party chatter,” I think it was because you can’t clearly formulate a reverse causal statement. That is, a reverse causal question does not in general have a well-defined answer, even in a setting where all possible data are made available. But I think Rubin made a mistake in his dismissal. The key is that reverse questions are valuable in that they focus on an anomaly—an aspect of the data unlikely to be reproducible by the current (possibly implicit) model—and point toward possible directions of model improvement.
 You can read the rest here.

7.26.2013

Pricing the clathrate gun hypothesis


In this week's Nature:
We calculate that the costs of a melting Arctic will be huge, because the region is pivotal to the functioning of Earth systems such as oceans and the climate. The release of methane from thawing permafrost beneath the East Siberian Sea, off northern Russia, alone comes with an average global price tag of $60 trillion in the absence of mitigating action — a figure comparable to the size of the world economy in 2012 (about $70 trillion). The total cost of Arctic change will be much higher. Much of the cost will be borne by developing countries, which will face extreme weather, poorer health and lower agricultural production as Arctic warming affects climate. All nations will be affected, not just those in the far north, and all should be concerned about changes occurring in this region. More modelling is needed to understand which regions and parts of the world economy will be most vulnerable.
Wikipedia on the clathrate gun hypothesis here. For scale, Costanza et al. calculated the annual value of the world's ecosystem services in 1997 at $16-54 trillion, or $23-79 trillion in today's dollars.

7.23.2013

Seismic externalities

Injection-Induced Earthquakes
William L. Ellsworth
Abstract: Earthquakes in unusual locations have become an important topic of discussion in both North America and Europe, owing to the concern that industrial activity could cause damaging earthquakes. It has long been understood that earthquakes can be induced by impoundment of reservoirs, surface and underground mining, withdrawal of fluids and gas from the subsurface, and injection of fluids into underground formations. Injection-induced earthquakes have, in particular, become a focus of discussion as the application of hydraulic fracturing to tight shale formations is enabling the production of oil and gas from previously unproductive formations. Earthquakes can be induced as part of the process to stimulate the production from tight shale formations, or by disposal of wastewater associated with stimulation and production. Here, I review recent seismic activity that may be associated with industrial activity, with a focus on the disposal of wastewater by injection in deep wells; assess the scientific understanding of induced earthquakes; and discuss the key scientific challenges to be met for assessing this hazard.
Perhaps an enterprising graduate student can figure out an optimal management strategy for this risk.

7.03.2013

Using Weather Data and Climate Model Output in Economic Analyses of Climate Change

After 5 (or 6?) rounds of revisions (a lesson to anyone thinking of writing an interdisciplinary review article...), this is finally published:
Using Weather Data and Climate Model Output in Economic Analyses of Climate ChangeReview of Environmental Economics and PolicyMaximilian Auffhammer, Solomon M. Hsiang, Wolfram Schlenker and Adam Sobel
We tried to write this as a practical and gentle introduction and how-to manual for econometricians and other applied social scientists. I hope it's helpful.

6.05.2013

Souped-up Watercolor Regression

I introduced "watercolor regression" here on FE several months ago, after some helpful discussions with Andrew Gelman and our readers. Over the last few months, I've made a few upgrades that I think significantly increase the utility of this approach for people doing work similar to my own.

First, the original paper is now on SSRN and documents the watercolor approach, explaining its relationship to the more general idea of visual-weighting.
Visually-Weighted Regression 
Abstract: Uncertainty in regression can be efficiently and effectively communicated using the visual properties of statistical objects in a regression display. Altering the “visual weight” of lines and shapes to depict the quality of information represented clearly communicates statistical confidence even when readers are unfamiliar with the formal and abstract definitions of statistical uncertainty. Here we present examples where the color-saturation and contrast of regression lines and confidence intervals are parametrized by local measures of an estimate’s variance. The results are simple, visually intuitive and graphically compact displays of statistical uncertainty. This approach is generalizable to almost all forms of regression.
Second, the Matlab code I've posted to do watercolor regression is now parallelized. If you have Matlab running on multiple processors, the code automatically detects this and runs the bootstrap procedure in parallel.  This is helpful because a large number of resamples (>500) is important for getting the distribution of estimates (the watercolored part of the plot) to converge but serial resampling gets very slow for large data sets (eg. >1M obs), especially when block-boostrapping (see below).

Third, the code now has an option to run a block bootstrap. This is important if you have data with serial or spatial autocorrelation (eg. models of crop yields that change in response to weather).  To see this at work, suppose we have some data where there is a weak dependance of Y on X, but all observations within a block (eg. maybe obs within a single year) have a uniform level-shift induced by some unobservable process.
e = randn(1000,1);
block = repmat([1:10]',100,1);
x = 2*randn(1000,1);
y = x+10*block+e;
The scatter of this data looks like:


where each one of stripes of data is block of obs with correlated residuals. Running watercolor_reg without block-bootrapping
 watercolor_reg(x,y,100,1.25,500)
we get an exaggerated sense of precision in the relationship between Y and X:


If we try to account for the fact that residuals within a block are not independent by using the block bootstrap
watercolor_reg(x,y,100,1.25,500,block)
we get a very different result:



Finally, the last addition to the code is a simple option to clip the watercoloring at the edge of a specified confidence interval (default is 95%), an idea suggested by Ted Miguel. This allows us to have a watercolor plot which also allows us to conduct some traditional hypothesis tests visually, without violating the principles of visual weighting. Applying this option to the example above
blue = [0 0 .3]
watercolor_reg(x,y,100,1.25,500,block, blue,'CLIPCI')
we obtain a plot with a clear 95% CI, where the likelihoods within the CI are indicated by watercoloring:


Code is here. Enjoy!

6.03.2013

Weather and Climate Data: a Guide for Economists

Now posted as an NBER working paper (it should be out in REEP this summer):

Using Weather Data and Climate Model Output in Economic Analyses of Climate Change
Maximilian Auffhammer, Solomon M. Hsiang, Wolfram Schlenker, Adam Sobel
Abstract: Economists are increasingly using weather data and climate model output in analyses of the economic impacts of climate change. This article introduces weather data sets and climate models that are frequently used, discusses the most common mistakes economists make in using these products, and identifies ways to avoid these pitfalls. We first provide an introduction to weather data, including a summary of the types of datasets available, and then discuss five common pitfalls that empirical researchers should be aware of when using historical weather data as explanatory variables in econometric applications. We then provide a brief overview of climate models and discuss two common and significant errors often made by economists when climate model output is used to simulate the future impacts of climate change on an economic outcome of interest.

5.27.2013

Hurricane-induced migration [Plot of the Week]

Impulse:

Hurricane Katrina, as pictured in the Gulf of Mexico at 21:45 UTC on August 28, 2005.

Response:

This map illustrates the national scope of the dispersion of refugees from Hurricane Katrina. It shows the location by zip code of the 800,000 displaced Louisiana residents who requested federal emergency assistance. The evacuees ended up dispersed across the entire nation, illustrating the wide-ranging impacts that can flow from extreme weather events, some of which are projected to increase in frequency and/or intensity as climate continues to change. (Source: Louisiana Geographic Information Center 2005)

5.17.2013

Climate Change: Recent Discoveries and Future Challenges

A conference at Columbia's Lamont campus this week, if you find yourself in the NYC area.

Climate Change: Recent Discoveries and Future Challenges

Abrupt Climate Change Studies Symposium
Cooperative Institute for Climate Applications and Research

21-23 May 2013
Columbia University Lamont Campus
Monell Building Auditorium

Speakers and agenda here
Register here

5.15.2013

Mathematics of Planet Earth

Here's an initiative that will suit FE readers:
More than a hundred scientific societies, universities, research institutes, and organizations all over the world have banded together to dedicate 2013 as a special year for the Mathematics of Planet Earth.
The initiative is multi-pronged, with everything from summer schools, curriculum materials, public lectures and a daily blog (which is quite good).

The idea behind "Mathematics of Planet Earth (MPE) according to MPE:

5.13.2013

What is the debate over climate and conflict about?

Last week, Andrew Solow published a Nature comment titled "A call for peace on climate and conflict." In the article, Solow raises many important points that I whole-heartedly agree with, such as trying to avoid data-mining, looking deep into statistical models when they disagree, engaging with qualitative researchers, and presenting and publishing across research communities. My coauthors and I agree so strongly with these latter points that we regularly present and engage with researchers outside of our field -- e.g. Marshall Burke recently presented at the International Studies Association (a political science meeting) and I recently presented at the Association of American Geographers, at an interdisciplinary water resources conference at UCSD, and I will be presenting to a community of medical doctors at Harvard today.

However, I worry that Solow's comment may confuse readers as to why there is controversy in the field. Solow begins his comment:
Among the most worrying of the mooted impacts of climate change is an increase in civil conflict as people compete for diminishing resources, such as arable land and water [1]. Recent statistical studies [2–4] reporting a connection between climate and civil violence have attracted attention from the press and policy-makers, including US President Barack Obama. Doubts about such a connection have not been as widely aired [5–7], but a fierce battle has broken out within the research community.
The battle lines are not always clear, but on one side are the ‘quants’, who use quantitative methods to identify correlations between conflict and climate in global or regional data sets. On the other side are the ‘quals’, who study individual conflicts in depth. They argue that the factors that underlie civil conflict are more complex than the quants allow and that the reported correlations are statistical artefacts. 
Where the papers he is referencing to are
1. Homer-Dixon (Princeton Press, 1999).
2. Miguel, Satyanath, Sergenti, J.Polit. Econ. (2004).
3. Burke, Miguel, Satyanath, Dykema,  Lobell, D. B. Proc. Natl Acad. Sci. USA (2009).
4. Hsiang, Meng, Cane, Nature  (2011).
5. Buhaug, Proc. Natl Acad. Sci. USA (2010).
6. Theisen, Holtermann, Buhaug, Internatl Secur. (2011).
7. Buhaug, Hegre, Strand, (Peace Research Institute of Oslo, 2010). 
Thus, the dispute that motivates the comment (referenced in the first paragraph) is the disagreement between Miguel-Burke-Hsiang et al vs Buhaug-Theisen-Buhaug et al while the transition in the second paragraph then shifts the discussion to a dispute between ‘quants’ and ‘quals’ (which is the topic of most of the text).  Because these two discussions are so intermingled, a careless reader might incorrectly conclude that the Miguel-Burke-Hsiang vs. Buhaug-Theisen debate is the qual vs. quant debate. This is not the case. Miguel-Burke-Hsiang et al and Buhaug-Theisen et al are all quantitative research groups. The debate between the two groups is about how quantitative research should be executed and interpreted. It is not a debate over whether quantitative or qualitative methods are better.

Because the Miguel-Burke-Hsiang vs. Buhaug-Theisen debate is raised in the comment, but not outlined, I summarize the papers that Solow cites here:

2004: Miguel et al. demonstrate that annual fluctuations in rainfall are negatively correlated with annual fluctuations in GDP growth and positively correlated with civil conflict in African countries. Miguel et al argue that rainfall changes influence conflict through this economic channel.

2009: Burke et al. (which includes Miguel and Satyanath, both authors on the 2004 paper) revisit this problem but include growing season temperature in their statistical model, motivated in part by other findings that temperature is a strong predictor of agricultural performance (even once rainfall is controlled for). They find that temperature appears to have an even stronger effect on conflict than rainfall. They conduct a number of robustness checks and project how conflict might change under global warming.

2010: Buhaug (PNAS) argues that Burke et al. arrive at incorrect conclusions because they should not include country fixed effects or country-specific trends in their statistical model. Buhaug instead advocates for a model that assumes all countries are identical (with respect to conflict) except for GDP and an index of political exclusion. Using this model, Buhaug argues that temperature has zero effect on conflict. Buhaug concludes his article with the statement:
"The challenges imposed by future global warming are too daunting to let the debate on social effects and required countermeasures be sidetracked by atypical, nonrobust scientific findings and actors with vested interests."
This is when the debate begins to get attention (eg. here)

2010: Buhaug et al. (PRIO) examine several additional dimensions of the result in Burke et al., such as its out of sample prediction and how results look when other measures of civil conflict are used. The authors conclude:
"In conclusion, the sensitivity assessments documented here reveal little support for the alleged positive association between warming and higher frequency of major civil wars in Africa… More research is needed to get a better understanding of the full range of possible social dimensions of climate change."
2011: Thiesen et al. revisit civil conflict in Africa by trying to pinpoint the locations where the first battle deaths in major wars occurred. Theisen et al examine whether the 0.5 degree pixels where these first deaths occurred were experiencing drought at the time of these deaths.  The authors follow Buhaug and do not use fixed effects, instead they use a model that assumes all pixels are identical except for six control variables (e.g. democracy, infant mortality). The authors do not find a statistically significant association between drought and the location of first battle death, so they conclude that climate does not affect civil conflict in Africa.

2011: Hsiang et al. examine whether the global climate (not local temperature) has any effect on global rates of civil conflict. Hsiang et al. identify the tropical and sub-tropical regions of the world that are most strongly affected by the El Nino-Southern Oscillation (ENSO) and then examine the likelihood that countries in this region start new civil conflicts, conditional on the state of ENSO. They find that in cooler/wetter La Nino years the rate of conflicts is half of what it is in hotter/drier El Nino years -- but only in the tropical and sub-tropical regions that are affected by this global cimate oscillation. The authors show that the additional conflicts observed in El Nino years only occur after El Nino begins and are focused in the poorest countries.

Some of my thoughts on the above debate (in no particular order):
  1. Clearly, this discussion is all based on statistical evidence -- it is not a debate as to whether quals or quants are better suited to answer this question.
  2. No statistical evidence undermining the findings of Hsiang et al has been released or published in the last two years (to my knowledge). Many authors have casually stated in reviews that "there are issues with the paper" or that Buhaug (2010) or Theisen et al (2011) disprove our findings (eg. here). But valid "issues" have not been pointed out to me, publicly or privately, and I do not see how these other papers can possibly be interpreted as disproving our results. Since I'm fairly certain that these authors have been trying to find problems with our paper, but have not released them anytime in the last two years, I am gaining confidence that our findings are extremely robust. Furthermore, one of Chris Blattman's graduate students recently replicated our paper successfully for an econometrics assignment.
  3. Buhaug and Theisen et al. generally overstate their findings. The estimates they obtain are extremely noisy, so they have very large confidence intervals, preventing them from rejecting a "zero effect"or very large effects. This is far from proving there is zero effect. For example, saying that X is somewhere between -100 and 100 is not evidence that X is exactly equal to 0. 
  4. Buhaug and Theisen et al.'s approach of dropping fixed effects, and assuming Africa is homogenous except for a handful of controls, is easily rejected by the data. A simple F-test for the joint significance of the fixed effects in Burke's model easily rejects their hypothesis that these effects are the same throughout Africa. 
  5. I think the paper by Thiesen et al is very difficult to interpret, since they are assigning all the potential causes of a conflict to conditions within the 50 x 50 km pixel where the first battle death occurred. Regardless of what results they report or whether the statistical techniques are sound, I'm not sure how I would interpret any of their results since I tend to think that many factors located beyond that pixel would affect the likelihood of civil war in a country.
  6. There is a general argument underlying all the Buhaug-Theisen articles that "because regression coefficients change a lot across our models, the result of Burke must be non-robust." But this is faulty statistical logic. If the regression coefficients are changing between models, this means that all the models (or all but one) are mis-specified because they have different omitted variables, which is causing a different amount of bias in each model (and thus the different regression coeffs). This does not imply that the "true effect"of climate is equal to zero. There can only be one true effect. A good model might identify this effect and be robust to small variations in the model, but the true relationship between any X and Y cannot be generally "non-robust" and presenting non-robust estimates certainly does not prove that the true effect is zero.
  7. Plotting the results in Burke et al. is pretty compelling evidence. There is some noise (which is what drives the Buhaug claims) but just plotting the data early on might have prevented all this controversy (perhaps I am dreaming). 
  8. I think Miguel and Satyanath should be praised for revisiting their 2004 findings, including an additional and important control variable and then altering their conclusions based on their new findings.
Note: I am not opposed to qualitative research. However, I do think that qualitative researchers must carefully consider the limited extent of their observations when drawing inferences.  Large scale political conflict is a rare event, so it is unlikely that a randomly sampled case study will observe conflict in association with climatic events, even if there is a strong relationship. More discussion of this point is here.

My coauthor Marshall Burke has some additional thoughts on Solow's Comment and the general debate on G-FEED.

5.02.2013

Getting in touch with our feelings

If the goal of our work is to improve global human welfare, we should be finding ways to measure it.

The Expression of Emotions in 20th Century Books
Alberto Acerbi, Vasileios Lampos, Philip Garnett, R. Alexander Bentley
Abstract: We report here trends in the usage of “mood” words, that is, words carrying emotional content, in 20th century English language books, using the data set provided by Google that includes word frequencies in roughly 4% of all books published up to the year 2008. We find evidence for distinct historical periods of positive and negative moods, underlain by a general decrease in the use of emotion-related words through time. Finally, we show that, in books, American English has become decidedly more “emotional” than British English in the last half-century, as a part of a more general increase of the stylistic divergence between the two variants of English language.
Historical periods of positive and negative moods. Difference between -scores of Joy and Sadness for years from 1900 to 2000 (raw data and smoothed trend). Values above zero indicate generally ‘happy’ periods, and values below the zero indicate generally ‘sad’ periods.

People are unhappy during economic depressions and world wars...

h/t Brenda

5.01.2013

The Long Term Impacts of Child Sponsorship

 Figure 1a: Hopefulness, 93rd percentile
A few weeks ago the economics blogs and popular press picked up a forthcoming JPE paper on child sponsorship by Bruce Wydick (also in the economics department here at USF), Paul Glewwe, and Laine Rutledge. The paper uses a clever identification strategy involving age-eligibility rules and sibling order to isolate the effect of charity child sponsorship on adult outcomes, and finds extraordinarily large positive effects on everything ranging from educational attainment to income to civil engagement. You can find an ungated version of the paper here.

4.25.2013

Toilets


Effects of Rural Sanitation on Infant Mortality and Human Capital: Evidence from India's Total Sanitation Campaign
Dean Spears
Abstract: Open defecation without a toilet or latrine is among the leading global threats to health, especially in India. Although it is well-known that modern sewage infrastructure improves health, it is unknown whether a sanitation program feasible for a low capacity, poor country government could be effective. This paper contributes the first causally identied estimates of effects of rural sanitation on health and human capital accumulation. The Indian government's Total Sanitation Campaign reports building one household pit latrine per ten rural persons from 2001 to 2011. The program offered local governments a large ex post monetary incentive to eliminate open defecation. I use several complementary identification strategies to estimate the program's effect on children's health. First, I exploit variation in program timing, comparing children born in different years. Second, I study a long difference-in-differences in aggregate mortality. Third, I exploit a discontinuity designed into the monetary incentive. Unlike many impact evaluations, this paper studies a full-scale program implemented by a large government bureaucracy with low administrative capacity. At the mean program intensity, infant mortality decreased by 4 per 1,000 and children's height increased by 0.2 standard deviations (similar to the cross-sectional difference associated with doubling household consumption per capita). These results suggest that, even in the context of governance constraints, incentivizing local leaders to promote technology adoption can be an effective strategy
How much international variation in child height can sanitation explain?
Dean Spears
Physical height is an important economic variable reflecting health and human capital. Puzzlingly, however, differences in average height across developing countries are not well explained by differences in wealth. In particular, children in India are shorter, on average, than children in Africa who are poorer, on average, a paradox called “the Asian enigma” which has received much attention from economists. This paper provides the first documentation of a quantitatively important gradient between child height and sanitation that can statistically explain a large fraction of international height differences. This association between sanitation and human capital is robustly stable, even after accounting for other heterogeneity, such as in GDP. The author applies three complementary empirical strategies to identify the association between sanitation and child height: country-level regressions across 140 country- years in 65 developing countries; within-country analysis of differences over time within Indian districts; and econometric decomposition of the India-Africa height differences in child-level data. Open defecation, which is exceptionally widespread in India, can account for much or all of the excess stunting in India.


Perhaps the most disturbing thing of all is the simple summary statistic that there are many regions where >50% of households do not have toilets.

4.23.2013

Self-control and long run outcomes


A gradient of childhood self-control predicts health, wealth, and public safety
Terrie E. Moffitt et al.
Abstract: Policy-makers are considering large-scale programs aimed at selfcontrol to improve citizens’ health and wealth and reduce crime. Experimental and economic studies suggest such programs could reap benefits. Yet, is self-control important for the health, wealth, and public safety of the population? Following a cohort of 1,000 children from birth to the age of 32 y, we show that childhood selfcontrol predicts physical health, substance dependence, personal finances, and criminal offending outcomes, following a gradient of self-control. Effects of children’s self-control could be disentangled from their intelligence and social class as well as from mistakes they made as adolescents. In another cohort of 500 sibling-pairs, the sibling with lower self-control had poorer outcomes, despite shared family background. Interventions addressing self-control might reduce a panoply of societal costs, save taxpayers money, and promote prosperity.
click to enlarge
 Related results from China's One Child Policy here.

4.18.2013

1-800-CLOUD-GONE

Ever been sitting by a window in the space station and feel annoyed that clouds are obstructing your view? Charlie Lyod and Chris Herwig of Mapbox (covered before on FE) have a simple but clever solution: sort your data by pixel.

Their explanation is clearer than mine (pun intended). I just wanted to post the pretty pictures.

Before:


After:


I think this idea has several applications beyond clearing the skies.

h/t Young