开发和使用从临床笔记中选择合群的特殊用途索引 (Developing and Using Special-Purpose Lexicons for Cohort Selection from Clinical Notes)

Background and Significance: Selecting cohorts for a clinical trial typically requires costly and time-consuming manual chart reviews resulting in poor participation. To help automate the process, National NLP Clinical Challenges (N2C2) conducted a shared challenge by defining 13 criteria for clinical trial cohort selection and by providing training and test datasets. This research was motivated by the N2C2 challenge. Methods: We broke down the task into 13 independent subtasks corresponding to each criterion and implemented subtasks using rules or a supervised machine learning model. Each task critically depended on knowledge resources in the form of task-specific lexicons, for which we developed a novel model-driven approach. The approach allowed us to first expand the lexicon from a seed set and then remove noise from the list, thus improving the accuracy. Results: Our system achieved an overall F measure of 0.9003 at the challenge, and was statistically tied for the first place out of 45 participants. The model-driven lexicon development and further debugging the rules/code on the training set improved overall F measure to 0.9140, overtaking the best numerical result at the challenge. Discussion: Cohort selection, like phenotype extraction and classification, is amenable to rule-based or simple machine learning methods, however, the lexicons involved, such as medication names or medical terms referring to a medical problem, critically determine the overall accuracy. Automated lexicon development has the potential for scalability and accuracy.

翻译：背景和意义:为临床试验选择组群通常需要花费昂贵和费时的人工图表审查,导致参与程度差。为了帮助这一进程自动化,国家国家实验室临床挑战(N2C2)通过界定临床试验组群选择的13项标准以及提供培训和测试数据集,共同应对挑战。这一研究是N2C2挑战推动的。方法:我们将任务分成13个与每项标准相对应的独立子任务,并使用规则或监督的机器学习模式执行次级任务。每项任务都严重依赖任务特定分类法形式的知识资源,我们为此制定了新的模型驱动的精确方法。这一方法使我们能够首先从种子组中扩大词汇,然后从清单中消除噪音,从而提高准确性。结果:我们的系统在挑战中实现了0.9003的总体F度,并在统计上与45名参与者中的第一个地方挂钩。模型驱动的词汇开发以及进一步调试基于培训的规则/守则,将总体计量法改进为0.9140,在挑战中过度采用最佳的数字结果。讨论时,Coporforlical 选择了规则的准确性,例如精度、Corlical 直观的医学分类、Clical 选择。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【斯坦福大学】面向机器学习的概率和统计要点速览(中文版)《CS 229 - Probabilities and Statistics refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

48+阅读 · 2019年12月19日

【斯坦福大学CS229】面向机器学习的线性代数和微积分要点速览(中文版)《CS 229 - Linear Algebra and Calculus refresher》by Afshine Amidi, Shervine Amidi

专知会员服务

197+阅读 · 2019年12月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日