8.20.2012

Visually-weighted confidence intervals

Following up on my earlier post describing visually weighted regression (paper here) and some suggestions from Andrew Gelman and others, I adjusted my Matlab function (vwregress.m, posted here) and just thought I'd visually document some of the options I've tried.

All of this information is available if you type help vwregress once the program is installed, but I think looking at the pictures helps.

The basic visually weighted regression is just a conditional mean where the visual weight of the line reflects uncertainty.  Personally, I like the simple non-parametric plot overlaid with the OLS regression since its clean and helps us see whether a linear approximation is a reasonable fit or not:
vwregress(x, y, 300, .5,'OLS',[0 0 1])
 


Confidence intervals (CI) can be added, and visually-weighted according to the same scheme as the conditional mean:
vwregress(x, y, 300, .5, 200,[0 0 1]);

Since the CI band is bootstrapped, Gelman suggested that we overlay the spaghetti plot of resampled estimates, I added the option 'SPAG' to do this. If the spaghetti are plotted using a solid color (option 'SOLID'), this ends up looking quasi-visually-weighted:
vwregress(x, y, 300, .5,'SPAG','SOLID',200,[0 0 1]);

But since it gets kind of nasty looking near the edges, where the estimates go little haywire since the observations get thin, we can visually weight the spaghetti too to keep it from getting distracting (just omit the 'SOLID' option).
vwregress(x, y, 300, .5,'SPAG',200,[0 0 1]);

Gelman also suggested that we try approximating the spaghetti by smoothly filling in the CI band, using the original visual-weighting scheme. To do this, I added the 'FILL' option. I like the result quite a bit (even more than the spaghetti, but others may disagree). [Warning: this plotting combination may be very slow, especially with a lot of resamples.]
vwregress(x, y, 300, .5,'FILL',200,[0 0 1]);

If 'SOLID' is combined with 'FILL', only the conditional mean is plotted with solid coloring. (This differs from 'SPAG' and the simple CI bounds).
vwregress(x, y, 300, .5,'FILL','SOLID',200,[0 0 1]);

Finally, I included the 'CI' option which changes the visual-weighting scheme from using weights 1/sqrt(N) to using 1/(CI_max - CI_min), where CI_max is the upper limit (point-wise) of the CI and CI_min is the lower limit. 

I like this because if we combine this with 'FILL', then the confidence band "conserves ink" (which we equate with confidence) in the y-dimension. Imagine that we squirt out ink uniformly to draw the conditional mean and then smear the ink vertically so that it stretches from the lower confidence bound to the upper confidence bound.  In places where the CI band is narrow, this will cause very little spreading of the ink so the CI band will be dark. But in places where the CI band is wide, the ink is smeared a lot so it gets lighter. For any vertical sliver of the CI band (think dx) the amount of ink displayed (integrated along a vertical line) will be constant.
vwregress(x, y, 300, .5,'CI','FILL',200,[0 0 1]);

For Stata users, I have written vwlowess.ado (here), but unfortunately it does not yet have any of these options.

Lucas Leeman has implemented some of these ideas in R (see here), so maybe he'll make that code available.

All of the above plots were made with the random data:
x = randn(200,1); 
e = randn(200,1).*(1+abs(x)).^1.5; 
y = 2*x+x.^2+4*e;

3 comments:

  1. Solomon: Great idea and beautiful figures. I did an R version of these, implementing a slightly different approach: http://www.nicebread.de/visually-weighted-regression-in-r-a-la-solomon-hsiang/

    Felix

    ReplyDelete
    Replies
    1. Beautiful! I posted an almost identical solution here and added it as an option in the original Matlab code here. I think its excellent to have an implementation in R as well. Thanks!

      Delete
  2. Yes, we came to the same solution using vertical densities. Independent convergence - I think that is a good sign :-)

    ReplyDelete