[This is the overdue earth-shattering sequel to this earlier post.]
I recently posted this working paper online. It's very short, so you should probably just read it (I was actually originally going to write it as a blog post), but I'll run through the basic idea here.
Since I'm proposing a method, I've written functions in Matlab (vwregress.m) and Stata (vwlowess.ado) to accompany the paper. You can download them here, but I expect that other folks can do a much better job implementing this idea.
Solomon M. Hsiang
Abstract: Uncertainty in regression can be efficiently and effectively communicated using the visual properties of regression lines. Altering the "visual weight" of lines to depict the quality of information represented clearly communicates statistical confidence even when readers are unfamiliar or reckless with the formal and abstract definitions of statical uncertainty. Here, we present an example by decreasing the color-saturation of nonparametric regression lines when the variance of estimates increases. The result is a simple, visually intuitive and graphically compact display of statistical uncertainty. This approach is generalizable to almost all forms of regression.
Here's the issue. Statistical uncertainty seems to be important for two different reasons. (1) If you have to make a decision based on data, you want to have a strong understanding of the possible outcomes that might result from your decision, which itself rests on how we interpret the data. This is the "standard" logic, I think, and it requires a precise, quantitative estimate of uncertainty. (2) Because there is noise in data, and because sampling is uneven across independent variables, a lot of data analysis techniques generate artifacts that we should mostly just ignore. We are often unnecessarily focused/intrigued by the wackier results that shows up in analyses, but thinking carefully about statistical uncertainty reminds us to not focus too much on these features. Except when it doesn't.
"Visually-weighted regression" is a method for presenting regression results that tries to address issue (2), taking a normal person's psychological response to graphical displays into account. I had grown a little tired of talks and referee reports where people speculate about the cause of some strange non-linearity at the edge of a regression sample, where there was no reason to believe the non-linear structure was real. I think this and related behaviors emerge because (i) there seems to be an intellectual predisposition to thinking that "nonlinearity" is inherently more interesting that "linearity" and (ii) the traditional method for presenting uncertainty subconsciously focuses viewers attention on features of the data that are less reliable. I can't solve issue (i) with data visualization, but we can try to fix (ii).
The goal of visually-weighted regression is to take advantage of viewer's psychological response to images in order to focus their attention on the results that are the most informative. "Visual weight" is a concept from art and graphical design that is used to to direct a viewer's focus within an image. Large, dark, high-contrast, and complex structures tend to "grab" a viewer's attention. Our brains are constantly looking for visual information and, somewhere along the evolutionary line, detailed/high-contrast structures in our field of view were probably more informative and more useful for survival, so we are programmed to give them more of our attention. Unfortunately, the traditional approaches to displaying statistical uncertainty give more visual weight to the uncertain portions of the analysis, which is exactly backwards of what we want. Ideally, a viewer will focus more of their attention on the portions of analysis that have some statical confidence and they will mostly ignore the portions of analysis that are so uncertain that they contain little or no information.
[continued below the fold]