Purpose: This study aims to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements. Design/methodology/approach: Taking cardiovascular research publications in China as a sample, we extracted the SPO triples as knowledge unit and the hedging/conflicting uncertainties as the knowledge context. We introduced Information Entropy and Uncertainty Rate as potential metrics to quantity the uncertainty of biomedical knowledge claims represented at different levels, such as the SPO triples (micro level), as well as the semantic type pairs (micro-level). Findings: The results indicated that while the number of scientific publications and total SPO triples showed a liner growth, the novel SPO triples occurring per year remained stable. After examining the frequency of uncertain cue words in different part of scientific statements, we found hedging words tend to appear in conclusive and purposeful sentences, whereas conflicting terms often appear in background and act as the premise (e.g., unsettled scientific issues) of the work to be investigated. Practical implications: Our approach identified major uncertain knowledge areas, such as diagnostic biomarkers, genetic characteristics, and pharmacologic therapies surrounding cardiovascular diseases in China. These areas are suggested to be prioritized in which new hypotheses need to be verified, and disputes, conflicts, as well as contradictions to be settled further.
翻译:本研究的目的是为从科学说明中提取和计量不确定的生物医学知识制定一种新的方法。设计/方法/方法/方法:在中国将心血管研究出版物作为样本,我们提取SPO三重作为知识单位,将套期/冲突不确定性作为知识背景。我们引入了信息渗透率和不确定率作为衡量在不同层次,如SPO三重(微观水平)和语义类型配对(微观水平)的生物医学知识要求不确定性的潜在衡量标准。结果:结果显示科学出版物的数量和SPO三重总量显示直线增长,而每年出现的新型SPO三重增长保持稳定。在审查了科学说明不同部分不确定提示词的频率之后,我们发现,套期词往往出现在结论性和目的性的句子中,而相互矛盾的术语往往出现在背景中,并成为所要调查工作的前提(例如,未解决的科学问题)。实际影响:我们的方法确定的主要不确定的知识领域,如诊断性生物标志、遗传特征和药理学三重性冲突,需要作为中国的优先处理领域。