Certainty and uncertainty are fundamental to science communication. Hedges have widely been used as proxies for uncertainty. However, certainty is a complex construct, with authors expressing not only the degree but the type and aspects of uncertainty in order to give the reader a certain impression of what is known. Here, we introduce a new study of certainty that models both the level and the aspects of certainty in scientific findings. Using a new dataset of 2167 annotated scientific findings, we demonstrate that hedges alone account for only a partial explanation of certainty. We show that both the overall certainty and individual aspects can be predicted with pre-trained language models, providing a more complete picture of the author's intended communication. Downstream analyses on 431K scientific findings from news and scientific abstracts demonstrate that modeling sentence-level and aspect-level certainty is meaningful for areas like science communication. Both the model and datasets used in this paper are released at https://blablablab.si.umich.edu/projects/certainty/.
翻译:肯定性和不确定性是科学交流的基础。隐蔽和不确定性被广泛用作不确定性的替代物。然而,确定性是一个复杂的概念,作者不仅表达不确定性的程度,而且表达不确定性的类型和方面,以便使读者对已知情况有一定的印象。在这里,我们引入了一项新的确定性研究,以科学发现中确定性的水平和方面为模型。我们使用2167年附加说明的科学发现的新数据集,表明单靠对冲只能部分解释确定性。我们表明,通过预先培训的语言模型可以预测总体确定性和个别方面,更完整地描述作者的打算通信。对新闻和科学摘要中431K科学调查结果的下游分析表明,在科学通信等领域,模拟判决水平和层面的确定性是有意义的。本文中使用的模型和数据集都在https://blablablabab.si.umich.edu/projects/certainty/上公布。