Medical question answering (QA) systems have the potential to answer clinicians uncertainties about treatment and diagnosis on demand, informed by the latest evidence. However, despite the significant progress in general QA made by the NLP community, medical QA systems are still not widely used in clinical environments. One likely reason for this is that clinicians may not readily trust QA system outputs, in part because transparency, trustworthiness, and provenance have not been key considerations in the design of such models. In this paper we discuss a set of criteria that, if met, we argue would likely increase the utility of biomedical QA systems, which may in turn lead to adoption of such systems in practice. We assess existing models, tasks, and datasets with respect to these criteria, highlighting shortcomings of previously proposed approaches and pointing toward what might be more usable QA systems.
翻译:医疗问题解答系统(QA)有可能解答临床医生对需求治疗和诊断的不确定性,而最新证据提供了这方面的信息;然而,尽管国家实验室方案社区在一般质量评估方面取得重大进展,但医疗质量评估系统在临床环境中仍没有得到广泛使用,原因之一可能是临床医生可能不太信任质量评估系统的产出,部分原因是透明度、可信度和出处不是设计此类模型的关键考虑因素;在本文件中,我们讨论了一套标准,如果满足这些标准,我们说它们有可能提高生物医学质量评估系统的效用,进而导致在实践中采用这种系统;我们评估了这些标准方面的现有模式、任务和数据集,强调了先前提出的方法的缺点,指出了哪些方法可能更有用。