Choosing experiments to accelerate collective discovery

How efficient are research agendas?

Abstract: A scientist’s choice of research problem affects his or her personal career trajectory. Scientists’ combined choices affect the direction and efficiency of scientific discovery as a whole. In this paper, we infer preferences that shape problem selection from patterns of published findings and then quantify their efficiency. We represent research problems as links between scientific entities in a knowledge network. We then build a generative model of discovery informed by qualitative research on scientific problem selection. We map salient features from this literature to key network properties: an entity’s importance corresponds to its degree centrality, and a problem’s difficulty corresponds to the network distance it spans. Drawing on millions of papers and patents published over 30 years, we use this model to infer the typical research strategy used to explore chemical relationships in biomedicine. This strategy generates conservative research choices focused on building up knowledge around important molecules. These choices become more conservative over time. The observed strategy is efficient for initial exploration of the network and supports scientific careers that require steady output, but is inefficient for science as a whole. Through supercomputer experiments on a sample of the network, we study thousands of alternatives and identify strategies much more efficient at exploring mature knowledge networks. We find that increased risk-taking and the publication of experimental failures would substantially improve the speed of discovery. We consider institutional shifts in grant making, evaluation, and publication that would help realize these efficiencies.
The paper is Rzhetsky et al.'s 2015 - Choosing experiments to accelerate collective discovery. (via Shanee)


El Niño is coming, make this time different

Kyle Meng and I published an op-ed in the Guardian today trying to raise awareness of the potential socioeconomic impacts, and policy responses, to the emerging El Niño.  Forecasts this year are extraordinary.  In particular, for folks who aren't climate wonks and who live in temperate locations, it is challenging to visualize the scale and scope of what might come down the pipeline this year in the tropics and subtropics. Read the op-ed here.

Countries where the majority of the population experience hotter conditions under El Niño are shown in red. Countries that get cooler under El Niño are shown in blue (reproduced from Hsiang and Meng, AER 2015)


Weekend Links

"Four dozen papers on conflict and fragility in Africa in under 2,000 words"

David Evans' coverage of last month's Annual Bank Conference on Africa is a great overview of some fascinating recent applied research. Highlights:

  • Extreme rain and drought both boost livestock theft in Kenya: raids driven by resource scarcity but also by weather that makes it easy to carry out a raid (Ralston).

  • Drought leads to increased violence against women. When the shock affects income asymmetrically across partners, it is associated with violence for the first time in the marriage (Cools et al.). 

  • Axbard et al. use variation in international mineral prices and within-country time and geographic variation to show that when a mine opens in South Africa, crime doesn’t increase. But you may not want to be around when the mine closes. 

  • “Members of ethnic groups exposed to greater historical missionary activity [in 19th-century Nigeria] express significantly less trust today,” using Afrobarometer trust measures (Okoye).
  • 4.16.2015

    Social welfare and robots

    As long as we're on the joint topics of ways to end an abstract and social welfare:
    "Policies that redistribute income across generations can ensure that a rise in robotic productivity benefits all generations."
    The ungated NBER working paper is here. (h/t Tyler Cowen, who has some thoughts on the general issue)


    Social welfare

    Meanwhile, in excellent ways to end an abstract:
    "[The policy] would also generate a significant welfare gain from the ex-ante standpoint of a newborn under the veil of ignorance."
    The original paper is here


    Data-driven causal inference

    Distinguishing cause from effect using observational data: methods and benchmarks

    From the abstract:
    The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y . This was often considered to be impossible. Nevertheless, several approaches for addressing this bivariate causal discovery problem were proposed recently. In this paper, we present the benchmark data set CauseEffectPairs that consists of 88 different "cause-effect pairs" selected from 31 datasets from various domains. We evaluated the performance of several bivariate causal discovery methods on these real-world benchmark data and on artificially simulated data. Our empirical results provide evidence that additive-noise methods are indeed able to distinguish cause from effect using only purely observational data. In addition, we prove consistency of the additive-noise method proposed by Hoyer et al. (2009).
    From the arxiv.org blog (note):
    The basis of the new approach is to assume that the relationship between X and Y is not symmetrical. In particular, they say that in any set of measurements there will always be noise from various cause. The key assumption is that the pattern of noise in the cause will be different to the pattern of noise in the effect. That’s because any noise in X can have an influence on Y but not vice versa.
    There's been a lot of research in stats on "causal discovery" techniques, and the paper in essence is running a horse race between Additive-Noise Methods and Information Geometric Causal Inference, with ANM winning out. Some nice overview slides providing background are here.


    Disasters and religiosity

    Jeanet Sinding Bentzen has a new version of her working paper on disasters (mostly earthquakes) and religiosity:
    Acts of God: Religiosity and Natural Disasters Across Subnational World Districts
    Religiosity affects everything from fertility and labor force participation to health. But why are some societies more religious than others? To answer this question, I test the religious coping theory, which states that many individuals draw on their religious beliefs to understand and deal with adverse life events. Combining subnational district level data on values across the globe from the World Values Survey with spatial data on natural disasters, I find that individuals are more religious when their district was hit recently by an earthquake. And further, that individuals are more religious when living in areas with higher long term earthquake risk. Using data on children of immigrants in Europe, I document that this is mainly due to a long-term effect: high religiosity levels evolving in high earthquake risk areas, is passed on through generations to individuals no longer living in high earthquake risk areas. The impact is global: earthquakes increase religiosity both within Christianity, Islam, and Hinduism, and within all continents. Last, I document that the results are consistent with the literature on religious coping and inconsistent with alternative theories of insurance or selection.
    Selected quote:
    "The estimates indicate that increasing earthquake risk by 30 percentiles from the median increases religiosity by 9 percentiles. The tendency is global: Christians, Muslims, and Hindus all exhibit higher religiosity in response to elevated earthquake risk, and so do inhabitants of every continent."
     via Amir.


    Spring thaw

    As long-time readers of the blog may have noticed, posting has been a little light the past 12 months. Sol and I are aiming the rectify that and will start posting more again over the next few weeks. Expect to see updates on some of our work, some resources and code snippets and, of course, coverage of papers and research we've found interesting. We hope you're all well, and look forward to getting the blog running again.