Andrew Gelman has a thought-provoking post on asking "Why?" in statistics:
Consider two broad classes of inferential questions:
1. Forward causal inference. What might happen if we do X? What are the effects of smoking on health, the effects of schooling on knowledge, the effect of campaigns on election outcomes, and so forth?
2. Reverse causal inference. What causes Y? Why do more attractive people earn more money? Why do many poor people vote for Republicans and rich people vote for Democrats? Why did the economy collapse? [...]
My question here is: How can we incorporate reverse causal questions into a statistical framework that is centered around forward causal inference. (Even methods such as path analysis or structural modeling, which some feel can be used to determine the direction of causality from data, are still ultimately answering forward casual questions of the sort, What happens to y when we change x?)
My resolution is as follows: Forward causal inference is about estimation; reverse causal inference is about model checking and hypothesis generation.Among many gems is this:
A key theme in this discussion is the distinction between causal statements and causal questions. When Rubin dismissed reverse causal reasoning as “cocktail party chatter,” I think it was because you can’t clearly formulate a reverse causal statement. That is, a reverse causal question does not in general have a well-defined answer, even in a setting where all possible data are made available. But I think Rubin made a mistake in his dismissal. The key is that reverse questions are valuable in that they focus on an anomaly—an aspect of the data unlikely to be reproducible by the current (possibly implicit) model—and point toward possible directions of model improvement.You can read the rest here.