Choosing experiments to accelerate collective discovery

How efficient are research agendas?

Abstract: A scientist’s choice of research problem affects his or her personal career trajectory. Scientists’ combined choices affect the direction and efficiency of scientific discovery as a whole. In this paper, we infer preferences that shape problem selection from patterns of published findings and then quantify their efficiency. We represent research problems as links between scientific entities in a knowledge network. We then build a generative model of discovery informed by qualitative research on scientific problem selection. We map salient features from this literature to key network properties: an entity’s importance corresponds to its degree centrality, and a problem’s difficulty corresponds to the network distance it spans. Drawing on millions of papers and patents published over 30 years, we use this model to infer the typical research strategy used to explore chemical relationships in biomedicine. This strategy generates conservative research choices focused on building up knowledge around important molecules. These choices become more conservative over time. The observed strategy is efficient for initial exploration of the network and supports scientific careers that require steady output, but is inefficient for science as a whole. Through supercomputer experiments on a sample of the network, we study thousands of alternatives and identify strategies much more efficient at exploring mature knowledge networks. We find that increased risk-taking and the publication of experimental failures would substantially improve the speed of discovery. We consider institutional shifts in grant making, evaluation, and publication that would help realize these efficiencies.
The paper is Rzhetsky et al.'s 2015 - Choosing experiments to accelerate collective discovery. (via Shanee)


El Niño is coming, make this time different

Kyle Meng and I published an op-ed in the Guardian today trying to raise awareness of the potential socioeconomic impacts, and policy responses, to the emerging El Niño.  Forecasts this year are extraordinary.  In particular, for folks who aren't climate wonks and who live in temperate locations, it is challenging to visualize the scale and scope of what might come down the pipeline this year in the tropics and subtropics. Read the op-ed here.

Countries where the majority of the population experience hotter conditions under El Niño are shown in red. Countries that get cooler under El Niño are shown in blue (reproduced from Hsiang and Meng, AER 2015)


Weekend Links

"Four dozen papers on conflict and fragility in Africa in under 2,000 words"

David Evans' coverage of last month's Annual Bank Conference on Africa is a great overview of some fascinating recent applied research. Highlights:

  • Extreme rain and drought both boost livestock theft in Kenya: raids driven by resource scarcity but also by weather that makes it easy to carry out a raid (Ralston).

  • Drought leads to increased violence against women. When the shock affects income asymmetrically across partners, it is associated with violence for the first time in the marriage (Cools et al.). 

  • Axbard et al. use variation in international mineral prices and within-country time and geographic variation to show that when a mine opens in South Africa, crime doesn’t increase. But you may not want to be around when the mine closes. 

  • “Members of ethnic groups exposed to greater historical missionary activity [in 19th-century Nigeria] express significantly less trust today,” using Afrobarometer trust measures (Okoye).
  • 4.16.2015

    Social welfare and robots

    As long as we're on the joint topics of ways to end an abstract and social welfare:
    "Policies that redistribute income across generations can ensure that a rise in robotic productivity benefits all generations."
    The ungated NBER working paper is here. (h/t Tyler Cowen, who has some thoughts on the general issue)


    Social welfare

    Meanwhile, in excellent ways to end an abstract:
    "[The policy] would also generate a significant welfare gain from the ex-ante standpoint of a newborn under the veil of ignorance."
    The original paper is here


    Data-driven causal inference

    Distinguishing cause from effect using observational data: methods and benchmarks

    From the abstract:
    The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y . This was often considered to be impossible. Nevertheless, several approaches for addressing this bivariate causal discovery problem were proposed recently. In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 different "cause-effect pairs" selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on artificially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from effect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009).
    From the arxiv.org blog (note):
    The basis of the new approach is to assume that the relationship between X and Y is not symmetrical. In particular, they say that in any set of measurements there will always be noise from various cause. The key assumption is that the pattern of noise in the cause will be different to the pattern of noise in the effect. That’s because any noise in X can have an influence on Y but not vice versa.
    There's been a lot of research in stats on "causal discovery" techniques, and the paper in essence is running a horse race between Additive-Noise Methods and Information Geometric Causal Inference, with ANM winning out. Some nice overview slides providing background are here.


    Disasters and religiosity

    Jeanet Sinding Bentzen has a new version of her working paper on disasters (mostly earthquakes) and religiosity:
    Acts of God: Religiosity and Natural Disasters Across Subnational World Districts
    Religiosity affects everything from fertility and labor force participation to health. But why are some societies more religious than others? To answer this question, I test the religious coping theory, which states that many individuals draw on their religious beliefs to understand and deal with adverse life events. Combining subnational district level data on values across the globe from the World Values Survey with spatial data on natural disasters, I find that individuals are more religious when their district was hit recently by an earthquake. And further, that individuals are more religious when living in areas with higher long term earthquake risk. Using data on children of immigrants in Europe, I document that this is mainly due to a long-term effect: high religiosity levels evolving in high earthquake risk areas, is passed on through generations to individuals no longer living in high earthquake risk areas. The impact is global: earthquakes increase religiosity both within Christianity, Islam, and Hinduism, and within all continents. Last, I document that the results are consistent with the literature on religious coping and inconsistent with alternative theories of insurance or selection.
    Selected quote:
    "The estimates indicate that increasing earthquake risk by 30 percentiles from the median increases religiosity by 9 percentiles. The tendency is global: Christians, Muslims, and Hindus all exhibit higher religiosity in response to elevated earthquake risk, and so do inhabitants of every continent."
     via Amir.


    Spring thaw

    As long-time readers of the blog may have noticed, posting has been a little light the past 12 months. Sol and I are aiming the rectify that and will start posting more again over the next few weeks. Expect to see updates on some of our work, some resources and code snippets and, of course, coverage of papers and research we've found interesting. We hope you're all well, and look forward to getting the blog running again.


    On giving a great applied talk

    Jesse Shapiro* has some excellent slides on giving a good applied micro talk that are both specific enough to be of use for students prepping job market talks, as well as general enough to simply provide good fodder for thinking about how one presents one's work to any audience. I highly recommend them. (via Kyle Meng)

    *: yet another Stuyvesant High School graduate.


    Ecotourism and poverty

    This is a hard problem to answer well, but its certainly an interesting question.

    Quantifying causal mechanisms to determine how protected areas affect poverty through changes in ecosystem services and infrastructure
    Paul J. Ferraroa and Merlin M. Hanauer

    Abstract: To develop effective environmental policies, we must understand the mechanisms through which the policies affect social and envi- ronmental outcomes. Unfortunately, empirical evidence about these mechanisms is limited, and little guidance for quantifying them exists. We develop an approach to quantifying the mechanisms through which protected areas affect poverty. We focus on three mechanisms: changes in tourism and recreational services; changes in infrastructure in the form of road networks, health clinics, and schools; and changes in regulating and provisioning ecosystem services and foregone production activities that arise from land- use restrictions. The contributions of ecotourism and other ecosys- tem services to poverty alleviation in the context of a real environ- mental program have not yet been empirically estimated. Nearly two-thirds of the poverty reduction associated with the establish- ment of Costa Rican protected areas is causally attributable to opportunities afforded by tourism. Although protected areas reduced deforestation and increased regrowth, these land cover changes neither reduced nor exacerbated poverty, on average. Protected areas did not, on average, affect our measures of in- frastructure and thus did not contribute to poverty reduction through this mechanism. We attribute the remaining poverty reduction to unobserved dimensions of our mechanisms or to other mecha- nisms. Our study empirically estimates previously unidentified contributions of ecotourism and other ecosystem services to pov- erty alleviation in the context of a real environmental program. We demonstrate that, with existing data and appropriate empiri- cal methods, conservation scientists and policymakers can begin to elucidate the mechanisms through which ecosystem conservation programs affect human welfare.


    When evidence does not suffice

    Halvard Buhaug and numerous coauthors have released a comment titled “One effect to rule them all? A comment on climate and conflict” which critiques research on climate and human conflict that I published in Science and Climatic Change with my coauthors Marshall Burke and Edward Miguel

    The comment does not address the actual content of our papers.  Instead it states that our papers say things they do not say (or that our papers do not say thing they actually do say) and then uses those inaccurate claims as evidence that our work is erroneous.

    I have posted my reaction to the comment on the G-FEED blog, written as the referee report that I would write if I were asked to referee the comment.

    (This is not the first time Buhaug and I have disagreed on what constitutes evidence. Kyle Meng and I recently published a paper in PNAS demonstrating that Buhaug’s 2010 critique of an earlier paper made aggressive claims that the earlier paper was wrong without actually providing evidence to support those claims.)