10.08.2012

Mashup: watercolor regression of reported rapes and daily temperature in US counties

I was working with Matthew Ranson's crime and temperature data recently for a review article when Andrew Gelman tossed in his two cents on replotting the main figures, so figured I'd see how one of the plots looked if we showed the results as a watercolor regression, since that was a recent innovation that arose from discussions on FE and Gelman's blog (watercolor regression is a type of visually-weighted regression).

I found the rape-vs-temperature plot particularly striking/perplexing/upsetting/interesting (yes, county, month-by-county and year-by-county effects have been removed from the data), so I converted the number of rape cases reported each month into percentages of the mean monthly number of reported rape cases. Temperature is the monthly mean (across days) of daily maximum temperature. Dark coloration depicts the probability that the conditional mean is at a specified value, and the estimated mean is the thin white line.

Click to enlarge

This is the largest sample I've run the watercolor regression code on (N > 1.4M), but it took less than ten minutes to plot with a few hundred resamples and 300 bins in the x-variable.  As my first attempt to use the code to show real data, I think I'm pretty satisfied with how clear the depiction of uncertainty is, without distracting from the main message (an issue described here).

[I've posted a Matlab function to compute and plot watercolor regressions here and Felix Sch√∂nbrodt posted an implementation in R here.]

No comments:

Post a Comment