6.05.2013

Souped-up Watercolor Regression

I introduced "watercolor regression" here on FE several months ago, after some helpful discussions with Andrew Gelman and our readers. Over the last few months, I've made a few upgrades that I think significantly increase the utility of this approach for people doing work similar to my own.

First, the original paper is now on SSRN and documents the watercolor approach, explaining its relationship to the more general idea of visual-weighting.
Visually-Weighted Regression 
Abstract: Uncertainty in regression can be efficiently and effectively communicated using the visual properties of statistical objects in a regression display. Altering the “visual weight” of lines and shapes to depict the quality of information represented clearly communicates statistical confidence even when readers are unfamiliar with the formal and abstract definitions of statistical uncertainty. Here we present examples where the color-saturation and contrast of regression lines and confidence intervals are parametrized by local measures of an estimate’s variance. The results are simple, visually intuitive and graphically compact displays of statistical uncertainty. This approach is generalizable to almost all forms of regression.
Second, the Matlab code I've posted to do watercolor regression is now parallelized. If you have Matlab running on multiple processors, the code automatically detects this and runs the bootstrap procedure in parallel.  This is helpful because a large number of resamples (>500) is important for getting the distribution of estimates (the watercolored part of the plot) to converge but serial resampling gets very slow for large data sets (eg. >1M obs), especially when block-boostrapping (see below).

Third, the code now has an option to run a block bootstrap. This is important if you have data with serial or spatial autocorrelation (eg. models of crop yields that change in response to weather).  To see this at work, suppose we have some data where there is a weak dependance of Y on X, but all observations within a block (eg. maybe obs within a single year) have a uniform level-shift induced by some unobservable process.
e = randn(1000,1);
block = repmat([1:10]',100,1);
x = 2*randn(1000,1);
y = x+10*block+e;
The scatter of this data looks like:


where each one of stripes of data is block of obs with correlated residuals. Running watercolor_reg without block-bootrapping
 watercolor_reg(x,y,100,1.25,500)
we get an exaggerated sense of precision in the relationship between Y and X:


If we try to account for the fact that residuals within a block are not independent by using the block bootstrap
watercolor_reg(x,y,100,1.25,500,block)
we get a very different result:



Finally, the last addition to the code is a simple option to clip the watercoloring at the edge of a specified confidence interval (default is 95%), an idea suggested by Ted Miguel. This allows us to have a watercolor plot which also allows us to conduct some traditional hypothesis tests visually, without violating the principles of visual weighting. Applying this option to the example above
blue = [0 0 .3]
watercolor_reg(x,y,100,1.25,500,block, blue,'CLIPCI')
we obtain a plot with a clear 95% CI, where the likelihoods within the CI are indicated by watercoloring:


Code is here. Enjoy!

6.03.2013

Weather and Climate Data: a Guide for Economists

Now posted as an NBER working paper (it should be out in REEP this summer):

Using Weather Data and Climate Model Output in Economic Analyses of Climate Change
Maximilian Auffhammer, Solomon M. Hsiang, Wolfram Schlenker, Adam Sobel
Abstract: Economists are increasingly using weather data and climate model output in analyses of the economic impacts of climate change. This article introduces weather data sets and climate models that are frequently used, discusses the most common mistakes economists make in using these products, and identifies ways to avoid these pitfalls. We first provide an introduction to weather data, including a summary of the types of datasets available, and then discuss five common pitfalls that empirical researchers should be aware of when using historical weather data as explanatory variables in econometric applications. We then provide a brief overview of climate models and discuss two common and significant errors often made by economists when climate model output is used to simulate the future impacts of climate change on an economic outcome of interest.