推动人工情报发展的基准数据集未能满足医务专业人员的需要 (Benchmark datasets driving artificial intelligence development fail to capture the needs of medical professionals)

Publicly accessible benchmarks that allow for assessing and comparing model performances are important drivers of progress in artificial intelligence (AI). While recent advances in AI capabilities hold the potential to transform medical practice by assisting and augmenting the cognitive processes of healthcare professionals, the coverage of clinically relevant tasks by AI benchmarks is largely unclear. Furthermore, there is a lack of systematized meta-information that allows clinical AI researchers to quickly determine accessibility, scope, content and other characteristics of datasets and benchmark datasets relevant to the clinical domain. To address these issues, we curated and released a comprehensive catalogue of datasets and benchmarks pertaining to the broad domain of clinical and biomedical natural language processing (NLP), based on a systematic review of literature and online resources. A total of 450 NLP datasets were manually systematized and annotated with rich metadata, such as targeted tasks, clinical applicability, data types, performance metrics, accessibility and licensing information, and availability of data splits. We then compared tasks covered by AI benchmark datasets with relevant tasks that medical practitioners reported as highly desirable targets for automation in a previous empirical study. Our analysis indicates that AI benchmarks of direct clinical relevance are scarce and fail to cover most work activities that clinicians want to see addressed. In particular, tasks associated with routine documentation and patient data administration workflows are not represented despite significant associated workloads. Thus, currently available AI benchmarks are improperly aligned with desired targets for AI automation in clinical settings, and novel benchmarks should be created to fill these gaps.

翻译：可用于评估和比较模型业绩的可公开查阅的基准是人工智能(AI)取得进展的重要驱动因素。尽管AI能力最近的进展有可能通过协助和扩大保健专业人员的认知过程来改变医疗做法,但AI基准对临床相关任务的范围基本上不清楚。此外,缺乏系统化的元信息,使临床AI研究人员能够迅速确定数据集和基准数据集的可获取性、范围、内容和其他特点以及与临床领域相关的数据基数。为解决这些问题,我们整理并发布了一个全面的数据集和基准目录,其中涉及临床和生物医学自然语言处理的广泛领域,这是基于对文献和在线资源的系统审查。总共450个NLP数据集是手工系统化的,并附有丰富的元数据,如有针对性的任务、临床适用性、数据类型、性能衡量标准、可获取性和许可信息以及数据的提供。我们随后将AI基准数据集所涵盖的任务与医学从业人员报告为自动化的高度可取目标的相关任务进行了比较。我们的分析表明,在临床直接相关的临床基准和在线资源(NLP)的广泛领域,我们的分析表明,在临床相关基准中,尽管临床基准与临床相关基准目前缺乏并且无法涵盖大部分日常工作量,但与临床相关数据管理所需的基准,因此,我们需要看到与临床相关基准,这些基准必须符合与临床相关的工作需要符合。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

世界经济论坛《利用人工智能加速能源转型》最新发布白皮书，Harnessing Artificial Intelligence to Accelerate the Energy Transition

专知会员服务

28+阅读 · 2022年4月4日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日