We stand at the foot of a significant inflection in the trajectory of scientific discovery. As society continues on its fast-paced digital transformation, so does humankind's collective scientific knowledge and discourse. We now read and write papers in digitized form, and a great deal of the formal and informal processes of science are captured digitally -- including papers, preprints and books, code and datasets, conference presentations, and interactions in social networks and communication platforms. The transition has led to the growth of a tremendous amount of information, opening exciting opportunities for computational models and systems that analyze and harness it. In parallel, exponential growth in data processing power has fueled remarkable advances in AI, including self-supervised neural models capable of learning powerful representations from large-scale unstructured text without costly human supervision. The confluence of societal and computational trends suggests that computer science is poised to ignite a revolution in the scientific process itself. However, the explosion of scientific data, results and publications stands in stark contrast to the constancy of human cognitive capacity. While scientific knowledge is expanding with rapidity, our minds have remained static, with severe limitations on the capacity for finding, assimilating and manipulating information. We propose a research agenda of task-guided knowledge retrieval, in which systems counter humans' bounded capacity by ingesting corpora of scientific knowledge and retrieving inspirations, explanations, solutions and evidence synthesized to directly augment human performance on salient tasks in scientific endeavors. We present initial progress on methods and prototypes, and lay out important opportunities and challenges ahead with computational approaches that have the potential to revolutionize science.
翻译:我们站在科学发现轨迹的足迹上,我们站在科学发现轨迹的脚下。随着社会继续快速的数字化转变,人类的集体科学知识和讨论也在继续。我们现在以数字化的形式阅读和撰写论文,大量正式和非正式的科学进程被数字化地捕捉,包括论文、预印本和书籍、代码和数据集、会议介绍以及社交网络和通信平台的互动。这一转变导致大量信息的增长,为分析和利用这些信息的计算模型和系统开辟了令人兴奋的机会。与此同时,数据处理能力的急剧增长推动了AI的显著进步,包括自我监督的神经模型,能够在没有昂贵的人类监督的情况下从大规模非结构文本中学习强有力的表述。社会和计算趋势的影响表明,计算机科学本身将引发一场革命。然而,科学数据、结果和出版物的爆炸性与人类认知能力的凝聚力形成鲜明对比。在快速增长的同时,我们的思想仍然静止不变,在科学研究能力方面存在着严重的局限性,在寻找、模拟和修正人类研究能力的能力方面,我们提出了一种稳定的研究能力,并提出了一种正确的研究议程。