Stata blog post on understanding matrices (with bonus Stata cheat sheet)

William Gould on Stata's blog (previously mentioned here) has two great posts (here and here) on the intuition behind matrices and regression coefficients. The section on near-singular matrices is characteristically nice:
Singular matrices are an extreme case of nearly singular matrices, which are the bane of my existence here at StataCorp. Here is what it means for a matrix to be nearly singular: [see figure]
Nearly singular matrices result in spaces that are heavily but not fully compressed. In nearly singular matrices, the mapping from x to y is still one-to-one, but x‘s that are far away from each other can end up having nearly equal y values. Nearly singular matrices cause finite-precision computers difficulty. Calculating y = Ax is easy enough, but to calculate the reverse transform x = A-1y means taking small differences and blowing them back up, which can be a numeric disaster in the making.
Both posts are great and I recommend them for anyone struggling with the intuition behind what exactly you're doing when you type in reg y x.

As an added bonus, earlier this week I stumbled across Kenneth Simon's excellent pdf cheat sheet of Stata commands for intermediate / advanced econometrics, here. I was trying to figure out a way to do something cute with distributed lag models and post-estimation tests, but the sheet covers everything from the simple but important (e.g., the difference between gen old = age >= 18 and gen old = age >= 18 if age<. ) to the arcane but potentially important (e.g., nonlinear hypothesis testing). If you're in applied work and use Stata I highly recommend flipping through it. I've already found several useful techniques I wasn't even aware existed.

1 comment: