Watercolor regression

I'm in a rush, so I will explain this better later. But Andrew Gelman posted my idea for a type of visually-weighted regression that I jokingly called "watercolor regression" without a picture, so its a little tough to see what I was talking about. Here is the same email but with the pictures to go along with it. The code to do you own watercolor regression is here as the 'SMOOTH' option to vwregress. (Update: I've added a separate cleaner function watercolor_reg.m for folks who want to tinker with the code but don't want to wade through all the other options built into vwregress. Update 2: I've added watercolor regression as a second example in the revised paper here.)

This was the email with figures included:

Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges).

1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the "fixed ink" scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. 

My posted code has been updated to do this automatically.

2. You and your readers didn't like that the edges of the filled CI were so sharp and arbitrary. But I didn't like that the contrast between the spaghetti lines and the background had so much visual weight.  So to meet in the middle, I smoothed the spaghetti plot to get a nonparametric estimate of the probability that the conditional mean is at a given value:

To do this, after generating the spaghetti through bootstrapping, I estimate a kernel density of the spaghetti in the Y dimension for each value of X.  I set the visual-weighting scheme so it still "preserves ink" along a vertical line-integral, so the distribution dims where it widens since the ink is being "stretched out". To me, it kind of looks like a watercolor painting -- maybe we should call it a "watercolor regression" or something like that.

The watercolor regression turned out to be more of a coding challenge than I expected, because the bandwidth for the kernel smoothing has to adjust to the width of the CI. And since several people seem to like R better than Matlab, I attached 2 figs to show them how I did this. Once you have the bootstrapped spaghetti plot:

I defined a new coordinate system that spanned the range of bootstrapped estimates for each value in X 

The kernel smoothing is then executed along the vertical columns of this new coordinate system.

I've updated the code posted online to include this new option. This Matlab code will generate a similar plot using my vwregress function:

x = randn(100,1);
e = randn(100,1);
y = 2*x+x.^2+4*e;

bins = 200;
color = [.5 0 0];
resamples = 500;
bw = 0.8;

vwregress(x, y, bins, bw, resamples, color, 'SMOOTH');

NOTE TO R USERS: The day after my email to andrew, Felix Sch√∂nbrodt posted a nice similar variant with code in R here.

Update: For overlaid regressions, I still prefer the simpler visually-weighted line (last two figs here) since this is what overlaid watercolor regressions look like:

It might look better if the scheme made the blue overlay fade from blue-to-clear rather than blue-to-white, but then it would be mixing (in the color sense) with the red so the overlaid region would then start looking like very dark purple. If someone wants to code that up, I'd like to see it. But I'm predicting it won't look so nice.


High temperatures cause violent crime and implications for climate change

I've posted about high temperature inducing individuals to exhibit more violent behavior when driving,  playing baseball and prowling bars.  These cases are neat anecdotes that let us see the "pure aggression" response in lab-like conditions. But they don't affect most of us too much. But violent crime in the real world affects everyone. Earlier, I posted a paper by Jacob et al. that looked at assault in the USA for about a decade - they found that higher temperatures lead to more assault and that the rise in violent crimes rose more quickly than the analogous rise in non-violent property-crime, an indicator that there is a "pure aggression" component to the rise in violent crime.

A new working paper "Crime, Weather, and Climate Change" by recent Harvard grad Matthew Ranson puts together an impressive data set of all types of crime in USA counties for 50 years. The results tell the aggression story using street-level data very clearly:

Note that all crime increases as temperatures rise from 0 F to about 50 F.  It seems reasonable to hypothesize that a lot of this pattern comes from "logistical constraints", eg. it's hard to steal a car when it's covered in snow. But above 60 F, only the violent crimes continue to go up: murder, rape, and assault.  The comparison between murder and manslaughter is elegantly telling, as manslaughter should be less motivated by malicious intent.

Ranson goes on to make projections about the expected effect of climate change:
Between 2010 and 2099, climate change will cause an additional 30,000 murders, 200,000 cases of rape, 1.4 million aggravated assaults, 2.2 million simple assaults, 400,000 robberies, 3.2 million burglaries, 3.0 million cases of larceny, and 1.3 million cases of vehicle theft in the United States.
This is pretty serious stuff. Ranson also shows that these effects haven't changed much over time, so the prospects for adaptation may be low. And there's no reason to believe that this relationship, which is probably neuro-physiological, doesn't hold outside of the USA.


Nonlinearities and exposure to extreme heat: what do we know?

There's been lots of talk about Hanson's work attributing extremes weather events to climate change. For a summary of some of our email discussions about the [ir]relevance of extremes and nonlinearities in measuring climate impacts, check out Marshall Burke's post in the G-FEED blog.


Two percent per degree Celsius

That's the magic number for how worker productivity responds to warm/hot temperatures.

In my 2010 PNAS paper, I found that labor-intensive sectors of national economies decreased output by roughly 2.4% per degree C and argued that this looked suspiously like it came from reductions in worker output. Using a totally different method and dataset, Matt Neidell and Josh Graff Zivin found that labor supply in micro data fell by 1.8% per degree C.  Both responses kicked in at around 26C.

Chris Sheehan just sent me this NYT article on air conditioning, where they mention this neat natural experiment:
[I]n the past year, [Japan] became an unwitting laboratory to study even more extreme air-conditioning abstinence, and the results have not been encouraging. After the Fukushima earthquake and tsunami knocked out a big chunk of the country’s nuclear power, the Japanese government mandated vastly reduced energy consumption. To that end, lights have been dimmed and air-conditioners turned down or off, so that offices comply with the government-prescribed indoor summer temperature of 82.4 degrees (28 Celsius); some offices have tried as high as 86. 
Unfortunately, studies by Shin-ichi Tanabe, a professor of architecture at Waseda University in Tokyo who has long been interested in “thermal comfort,” found that while workers tolerated dimmer light just fine, every degree rise in temperature above 25 Celsius (77 degrees Fahrenheit) resulted in a 2 percent drop in productivity. Over the course of the day that meant they accomplished 30 minutes less work, he said.
I have said before that empirical social science should strive to replicate results and obtain similar parameters. I think we are getting there on this one.

And in case anyone is [still] listening, I [still] think that persistently reduced labor productivity may be one of the largest economic impacts of anthropogenic climate change.

I couldn't locate the Tanabe study (it sounds like it might be in Japanese), but his lab looks really cool (pun intended, but also true): they focus almost exclusively on thermal comfort and productivity. Instead of the Fukushima study, Tanabe sent me this one, which is also relevant and contains the magic number:


Visually-weighted confidence intervals

Following up on my earlier post describing visually weighted regression (paper here) and some suggestions from Andrew Gelman and others, I adjusted my Matlab function (vwregress.m, posted here) and just thought I'd visually document some of the options I've tried.

All of this information is available if you type help vwregress once the program is installed, but I think looking at the pictures helps.

The basic visually weighted regression is just a conditional mean where the visual weight of the line reflects uncertainty.  Personally, I like the simple non-parametric plot overlaid with the OLS regression since its clean and helps us see whether a linear approximation is a reasonable fit or not:
vwregress(x, y, 300, .5,'OLS',[0 0 1])

Confidence intervals (CI) can be added, and visually-weighted according to the same scheme as the conditional mean:
vwregress(x, y, 300, .5, 200,[0 0 1]);

Since the CI band is bootstrapped, Gelman suggested that we overlay the spaghetti plot of resampled estimates, I added the option 'SPAG' to do this. If the spaghetti are plotted using a solid color (option 'SOLID'), this ends up looking quasi-visually-weighted:
vwregress(x, y, 300, .5,'SPAG','SOLID',200,[0 0 1]);

But since it gets kind of nasty looking near the edges, where the estimates go little haywire since the observations get thin, we can visually weight the spaghetti too to keep it from getting distracting (just omit the 'SOLID' option).
vwregress(x, y, 300, .5,'SPAG',200,[0 0 1]);

Gelman also suggested that we try approximating the spaghetti by smoothly filling in the CI band, using the original visual-weighting scheme. To do this, I added the 'FILL' option. I like the result quite a bit (even more than the spaghetti, but others may disagree). [Warning: this plotting combination may be very slow, especially with a lot of resamples.]
vwregress(x, y, 300, .5,'FILL',200,[0 0 1]);

If 'SOLID' is combined with 'FILL', only the conditional mean is plotted with solid coloring. (This differs from 'SPAG' and the simple CI bounds).
vwregress(x, y, 300, .5,'FILL','SOLID',200,[0 0 1]);

Finally, I included the 'CI' option which changes the visual-weighting scheme from using weights 1/sqrt(N) to using 1/(CI_max - CI_min), where CI_max is the upper limit (point-wise) of the CI and CI_min is the lower limit. 

I like this because if we combine this with 'FILL', then the confidence band "conserves ink" (which we equate with confidence) in the y-dimension. Imagine that we squirt out ink uniformly to draw the conditional mean and then smear the ink vertically so that it stretches from the lower confidence bound to the upper confidence bound.  In places where the CI band is narrow, this will cause very little spreading of the ink so the CI band will be dark. But in places where the CI band is wide, the ink is smeared a lot so it gets lighter. For any vertical sliver of the CI band (think dx) the amount of ink displayed (integrated along a vertical line) will be constant.
vwregress(x, y, 300, .5,'CI','FILL',200,[0 0 1]);

For Stata users, I have written vwlowess.ado (here), but unfortunately it does not yet have any of these options.

Lucas Leeman has implemented some of these ideas in R (see here), so maybe he'll make that code available.

All of the above plots were made with the random data:
x = randn(200,1); 
e = randn(200,1).*(1+abs(x)).^1.5; 
y = 2*x+x.^2+4*e;


Declining public interest in the drought

David Lobell mentioned that there seemed to be less news coverage of the drought, so I checked Google Trends and David was right. Looking just the USA, interest in the drought peaked about a week ago:

(news report volume looks similar, but Google doesn't give me the raw data). Is interest/news falling because the nation's corn crop has recovered? Probably not.  But a week ago, something else took over the airwaves and peoples' attention:

Is this spurious? It's possible, but this general pattern is well documented. In a 2007 articleDavid Str√∂mberg linked the quantity of US disaster relief (a proxy for public interest) to "whether the disaster occurs at the same time as other newsworthy events, such as the Olympic Games, which are obviously unrelated to need."  He concludes "that the only plausible explanation of this is that relief decisions are driven by news coverage of disasters and that the other newsworthy material crowds out this news coverage." So it isn't crazy to think that the London Games might soak up some of the public interest that would otherwise go towards our own drought.

In a closely related 2011 paperMatthew Kahn and Matthew Kotchen showed that "an increase in a state's unemployment rate decreases Google searches for "global warming" and increases searches for "unemployment."

Yet, while it seems unlucky for folks in the midwest to get hit by this drought during the Olympics, they are "lucky enough" to get hit just before the presidential race. In their 2007 paperThomas Garrett and Russell Sobel "find that presidential and congressional influences affect the rate of disaster declaration and the allocation of FEMA disaster expenditures across states. States politically important to the president have a higher rate of disaster declaration by the president... Election year impacts are also found. Our models predict that nearly half of all disaster relief is motivated politically rather than by need. The findings reject a purely altruistic model of FEMA assistance and question the relative effectiveness of government versus private disaster relief."

(cross posted on G-FEED)