Fight Entropy: identification

Showing posts with label identification. Show all posts

4.06.2015

Data-driven causal inference

Distinguishing cause from effect using observational data: methods and benchmarks

From the abstract:

The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y . This was often considered to be impossible. Nevertheless, several approaches for addressing this bivariate causal discovery problem were proposed recently. In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 different "cause-effect pairs" selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on artificially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from effect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009).

From the arxiv.org blog (note):

The basis of the new approach is to assume that the relationship between X and Y is not symmetrical. In particular, they say that in any set of measurements there will always be noise from various cause. The key assumption is that the pattern of noise in the cause will be different to the pattern of noise in the effect. That’s because any noise in X can have an influence on Y but not vice versa.

There's been a lot of research in stats on "causal discovery" techniques, and the paper in essence is running a horse race between Additive-Noise Methods and Information Geometric Causal Inference, with ANM winning out. Some nice overview slides providing background are here.

12.09.2013

What is identification?

There are relatively few non-academic internet resources on identification and causal inference in the social sciences, especially of the sort that can be consumed by a nonspecialist. To remedy that slightly I decided to tidy up and post some slides I've used to give talks on causal inference a few times in the past year. They're aimed at senior undergrad or graduate students with at least some background in statistics or econometrics, and can be found here:

Causal Inference, Identification, and Identification Strategies

Feel free to drop me a line and give me feedback, especially if somethings seems unclear / incorrect. Thanks!

5.01.2013

The Long Term Impacts of Child Sponsorship

Figure 1a: Hopefulness, 93rd percentile

A few weeks ago the economics blogs and popular press picked up a forthcoming JPE paper on child sponsorship by Bruce Wydick (also in the economics department here at USF), Paul Glewwe, and Laine Rutledge. The paper uses a clever identification strategy involving age-eligibility rules and sibling order to isolate the effect of charity child sponsorship on adult outcomes, and finds extraordinarily large positive effects on everything ranging from educational attainment to income to civil engagement. You can find an ungated version of the paper here.

Paul Meier passes away

The New York Times and boingboing point out that Paul Meier, one of the first and loudest proponents of randomized trials in medicine, passed away last week. From his obituary:

As early as the mid-1950s, Dr. Meier was one of the first and most vocal proponents of what is called “randomization.”

Under the protocol, researchers randomly assign one group of patients to receive an experimental treatment and another to receive the standard treatment. In that way, the researchers try to avoid unintentionally skewing the results by choosing, for example, the healthier or younger patients to receive the new treatment.

If the number of subjects is large enough, the two groups will be the same in every respect except the treatment they receive. Such randomized controlled trials are considered the most rigorous way to conduct a study and the best way to gather convincing evidence of a treatment’s effects.

Before randomization, the science of clinical trials was imprecise. Researchers, for example, would give a new treatment to patients who they thought might benefit and compare the outcomes to those of previous patients who were not treated, a method that could introduce serious bias.

The article rightly focuses on Meier's influence in medicine, but the influence of the randomized trial on modern economics (and the applied social sciences in general) cannot be overstated. The observation that randomization lets one link correlation to causality as opposed to simple association is the basis of much of modern econometrics, and the framing and intellectual architecture built upon that insight has provided a host of very important results. Everything from Freakonomics' popularization of applied microeconomics to the ongoing row in the development community between the highly successful "randomistas" and their critics can be viewed as stemming from Meier and his allies' attempts to make medical research robust. While randomization clearly isn't everything, and even a well-identified research strategy must be subject to a host of caveats, if you're following this blog, doing applied work yourself, or just generally care about empirically determined (as opposed to theoretically justified) policy, you might want to raise a glass tonight to Meier and his legacy.

8.14.2011

Weather, stock market returns, and subtlety in causal inference

While hanging out with a few academic friends on Friday I began discussing a recent research paper with someone I didn't know particularly well. It turned out that this guy was the odd man out of the group and instead of being a professor / post doc / grad student he worked in finance, and was not terribly supportive of a lot of empirical work. Trotting out the classic "correlation doesn't imply causation" critique he then said something along the lines of "you could show that rain makes the stock market go up and down and it wouldn't mean anything." This of course reminded me of one of my favorite counterintuitive-but-compelling research literatures: the effects of weather on stock market returns.

Now, first off, it has to be said that one of the nice things about working with climate data and effects is that causality is, in fact, generally pretty easy to establish. While humans appear to be quite good at affecting climate at decadal time scales, we generally are unable to affect day-to-day or even month-to-month weather patterns, and have great difficulty predicting timing and spatial patterns of highly-relevant weather behavior such as heat waves and storms even over a time span of hours or days. While this is bad from a welfare point of view (e.g., we'd love to be able to predict where a hurricane will make landfall a month ahead of time) it means that statistical analyses of the impact of weather itself on a given phenomenon, provided you're careful about your research design, are generally pretty causally attributable. (see important caveat below)*

Given that, it turns out that there's some pretty strong evidence that weather affects stock market returns. There are multiple papers pointing out that stock market returns are affected by local weather (the latter of those containing this depressing gem of wisdom: "behavioral finance shows that lower temperature can lead to aggression, while higher temperature can lead to both apathy and aggression"). My favorite and, as far as I can tell from this literature, the definitive word on the subject so far, is this paper by Hirschleifer and Shumway showing that: yes, stock market returns are affected by the weather; the effect is driven by sunlight or the lack thereof and not precipitation per se; but the effects are so small that the only way to arbitrage across it is if you have absurdly low transaction costs (echoing one of my favorite applied finance papers of all time, Schleiffer's The Limits of Arbitrage).

If the sunlight result makes you think of SAD, or seasonal affective disorder, you're onto something interesting: Kamstra, Kramer and Levi find strong evidence (getting some nice identification off of solar insolation across hemispheres) that stock markets experience something like it, too. A follow up paper argues that one could capture the same result based just on hemisphere-appropriate seasonality and that an explicitly psychological 'SAD' effect is probably not supportable at present, though that finding was in turn disputed by Kamstra et al. Regardless, I'd argue that (a) seasonality driving markets is a fairly interesting idea, as is any result that links natural processes (which, after all, the seasons fundamentally are) and human behavior and (b) this only further impresses the necessity of the important caveat below.

All of which is to say that sometimes what seems at first glance to be a semi-ludicrous postulate can turn out to be quite true. Evidence has been found that stock markets are affected by everything from sports results to lunar phases, and in many cases these relationships seem both intuitive and robust. The question of what those results mean, however, can sometimes be difficult to tease out (have I mentioned the important caveat*?), so a policy proscription or a deeper insight into human nature might not actually be forthcoming. Put in other words, perhaps my finance friend was right: you can show that stock markets are affected by sunny days, but really, what does that mean?

* Important caveat: The fact that weather is exogenous doesn't mean that saying something about mechanisms / pathways / etc. is easy. Weather affects everything from crop production to labor supply to ecology and phenology to the stock market behaviors seen above, so if you're going to make a claim about weather affecting something *through* some pathway, or even more dangerously plan on using it as an instrument, you should be very, very careful. Economists call your justification for claiming causality in such cases your "exclusion restriction," and if there's one concept I'd like to see enter into the general population memosphere, it's that.