IV & 2SLS
IV was originally developed to combat attenuation bias in ordinary least-squares, but somebody (I don't know who) realized it could be used to estimate local average treatment effects in experimental settings when not all subjects in a treatment group actually received the treatment of interest. From there, someone figured out that IV could be used in retrospective studies to isolate treatment effects when assignment to treatment had some endogenous component. This last interpretation is where I think a lot of folks run into trouble. (If you don't know what any of this means, you probably won't care about the rest of this post...)
All of these conceptual innovations were big contributions. But what happened afterwards led to a lot of sloppy thinking. IV is often taught as a method for "removing" endogeneity from an independent variable (variable X) in a regression. While strictly a correct statement (if some relatively strong assumptions are met), it often feels as if it's abused. My perception from papers/talks/discussion is that most students interpret IV in a similar way to how they interpret a band-pass filter: if you find a plausible instrument (variable Z), you can "filter out" the bad (endogenous) part of your independent variable and just retain the good (exogenous) part. While a useful heuristic for teaching, its overuse leads to bad intuition.
IV with one instrument
The first stage regression in IV "filters out" all variation in X that does not project (for now, linearly) onto Z. This linear projection is just Z*b, but it is often written as "X-hat". While not mathematically incorrect, and appealing for heuristic purposes, I think that using X-hat as a notational tool is why a lot of students get confused. It makes students think that X-hat is just like X, except without all the bad endogenous parts. But really, X-hat is more like Z than it is like X. In fact, X-hat IS Z! It's just linearly rescaled by b, which is just a change in units. The reason why X-hat is exogenous if Z is exogenous is because
X-hat = Z * b
where b is a constant. It seems silly and patronizing to reiterate an equation that is taught in metrics 101 a gazillion times, but people seem to forget that this little rescaling equation is doing all the heavy lifting in an IV model. (I bet that if we never used "X-hat" notation and we renamed "the first stage" something less exciting, like "linearly converting units," then grad students would think IV is much less magical...)
Once X-hat is estimated, it is used as the new regressor in the "second stage" equation
Y = X-hat * a + error
and then a big fuss is made about how a is the unbiased effect of X on Y. But if we drop the X-hat notation and replace it with the rescaled-Z notation, we get something less exciting:
Y = Z * b * a + error
Where a is now the unbiased effect of Z on Y, but rescaled by a factor 1/b. For those familiar with IV, this looks a lot like the reduced form equation
Y = Z * c + error
which was always valid because Z is exogenous. Clearly,
a = c / b.
If you're still reading, you should be yawning, since this all seems very boring: these are all just variations on the same reduced form equation. And that's my main point, which I don't think is taught enough: Instrumental variables is only as good as your reduced form, because it is a linear rescaling of your reduced form. Of course, there are adjustments to standard errors, but if you're doing IV to remove bias, then you were focused on your point estimate to begin with. (An aside: it drives me nuts when people are "uninterested" in the reduced form effect of natural variations [eg. weather] on some outcome, but then are fascinated if you use the same variation as an instrument for some otherwise endogenous intermediate variable. The latter regression is the same as the first, only the coefficient has been rescaled by 1/b!)
Enough ranting. How do we visualize this? Its as easy as it sounds: you rescale the horizontal axis in your reduced form regression.
To illustrate this, I generated 100 false data points with the equations
Z ~ uniform on [0,1]
X = Z * 2 + e1
Y = X + e2
where e1 and e2 are N(0,1). And then plot the reduced form equation (Z vs. Y) as the blue line in the upper-left panel:
I then estimate the first stage regression in the lower left panel and show the predicted values X-hat as open red circles. In the lower right panel, I just plot X-hat against X-hat to show that I am reflecting these values from the vertical axis (ordinate) to the horizontal axis (abcissa). Then, keeping X-hat on the horizontal axis, I plot Y against X-hat in the upper right panel. This is the second stage estimate from IV, which gives us our unbiased estimate of the coefficient a (the slope of the red line).
[The code to generate these plots in Stata is at the bottom of this post.]
What is different between the scatter on the upper left (blue line, representing the reduced form regression) and the scatter on the upper right (red line, representing the second stage of IV)? Not too much. The vertical axis is the same and the relative locations of the data points are the same. The only change is that the horizontal variable has been rescaled by a factor of two (recall the data generating process above). The IV regression is just a rescaled version of the reduced form. What happened to X? It was left behind in the lower left panel and it never went anywhere else. It only feels like it is somehow present in the upper right panel because we renamed Z*b the glitzier X-hat.
Multiple instruments (2SLS)
Sometimes people have more than one exogenous variable that influences X (eg. temperature and rainfall). Both of these variables can be used as instruments in the first stage. What happens to the intuition above when we do this? Not too much, except things get harder to draw. But the fact that it's harder to draw doesn't mean that the estimate is necessarily more impressive or magical.
Suppose we have now have instruments Z1 and Z2 such that
Multivariate first stage: X = Z1 - Z2
Second stage: Y = 2 * X
then the first stage regression (X-hat) looks like this:
where the two horizontal axes are Z1 and Z2 and the vertical axis is X.
If we substitute this first stage regression into the second stage, we get
Y = 2 * (Z1 - Z2)
We can easily plot this version of Y as a function of Z1 and Z2 (the reduced form). Here, it's the purple plane:
The resulting second stage regression would take the observed value for Y (purple) and project it onto the predicted values for X (green). The parameter of interest (2 here, but the variable "a" earlier) is just the ratio of the height of the purple surface to the height of the green surface.
Again, everything can be stated in terms of the instruments Z1 and Z2 (the two horizontal axes), which means the reduced form is again just as good as the second stage. X only enters by determining how steep the green surface is.
Nonlinear first stage with multiple instruments
Finally, some people use a non-linear first stage. Again, this is not terribly different. Suppose we have
Nonlinear multivariate first stage: X = Z1^2 - Z2
Second stage: Y = 2 * X
Then the first stage looks like:
and the reduced form regression of Y on Z1 and Z2 is gives us
Y = 2 * (Z1^2 - Z2)
which we overlay in purple again:
Since the second stage didn't change, the height of the purple surface is still always twice the distance from zero relative to the green surface (a scatter plot of pink vs. green values would be a straight line with slope = 2 = a). This graph looks fancier, but the instruments Z1 and Z2 are still driving everything. The endogenous variable X only enters passively by determining the height of the green surface. Once that's done, the Z1 and Z2 do the rest.
Take away: Reduced form models can be very interesting. However, instrumental variables models are rarely much more interesting. In fact, all the additional interestingness of the IV model arises from the exclusion restriction that is assumed, but this assumption is usually false (recall: how many papers use weather as an instrument?), so the IV model is probably exactly as interesting as the reduced form, except that it has larger standard errors and the wrong units.
[If you disagree, send me a clear visualization of your 2SLS results and all the steps you used to get there. I'd love to be wrong here.]
[below is the STATA script that will generate the first figure in the post]
set obs 100
//generate the instrument
gen z = runiform()
//generate noise terms
gen e1 = rnormal()*.5
gen e2 = rnormal()*.5
//generate the variable to be instrumented for
gen x = 2*z + e1
//generate the outcome
gen y =1*x + e2
reg x z
predict x_hat, xb
lab var x_hat "x_hat = z*b"
reg y x
reg y x_hat
ivregress 2sls y (x = z)
tw (sc y z)(lfit y z, lcolor(blue)), tit(reduced form) ytit(y) saving(reduced_form, replace) legend(off)
tw (sc x z)(sc x_hat z, mcolor(red) m(Oh)), tit(first stage) ytit(x) saving(first_stage, replace) legend(off)
tw (sc x_hat x_hat, mcolor(red) m(Oh)), tit(reflection) ytit(x_hat = z*b) saving(reflection, replace) legend(off)
tw (sc y x_hat)(lfit y x_hat, lcolor(red)), tit(second stage) ytit(y) saving(second_stage, replace) legend(off)
graph combine reduced_form.gph second_stage.gph first_stage.gph reflection.gph, xcommon ycommon
graph export "IV_graph.pdf"