Choosing experiments to accelerate collective discovery
A scientist’s choice of research problem affects his or her personal career trajectory. Scientists’ combined choices affect the direction and efficiency of scientific discovery as a whole. In this paper, we infer preferences that shape problem selection from patterns of published findings and then quantify their efficiency. We represent research problems as links between scientific entities in a knowledge network. We then build a generative model of discovery informed by qualitative research on scientific problem selection. We map salient features from this literature to key network properties: an entity’s importance corresponds to its degree centrality, and a problem’s difficulty corresponds to the network distance it spans. Drawing on millions of papers and patents published over 30 years, we use this model to infer the typical research strategy used to explore chemical relationships in biomedicine. This strategy generates conservative research choices focused on building up knowledge around important molecules. These choices become more conservative over time. The observed strategy is efficient for initial exploration of the network and supports scientific careers that require steady output, but is inefficient for science as a whole. Through supercomputer experiments on a sample of the network, we study thousands of alternatives and identify strategies much more efficient at exploring mature knowledge networks. We find that increased risk-taking and the publication of experimental failures would substantially improve the speed of discovery. We consider institutional shifts in grant making, evaluation, and publication that would help realize these efficiencies.
- Profiling Reactive Metabolites via Chemical Trapping and Targeted Mass Spectrometry
- Does the brain listen to the gut?
- (Meta)genomic insights into the pathogenome of Cellulosimicrobium cellulans
- A robust adaptive denoising framework for real-time artifact removal in scalp EEG measurements
- Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx
- Small Rad51 and Dmc1 Complexes Often Co-occupy Both Ends of a Meiotic DNA Double Strand Break
- Controlling the Cyanobacterial Clock by Synthetically Rewiring Metabolism
- Choosing experiments to accelerate collective discovery
- The transcriptional landscape of age in human peripheral blood
- Digital signaling decouples activation probability and population heterogeneity