In the electronic health record, using clinical notes to identify entities such as disorders and their temporality (e.g. the order of an event relative to a time index) can inform many important analyses. However, creating training data for clinical entity tasks is time consuming and sharing labeled data is challenging due to privacy concerns. The information needs of the COVID-19 pandemic highlight the need for agile methods of training machine learning models for clinical notes. We present Trove, a framework for weakly supervised entity classification using medical ontologies and expert-generated rules. Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to learning from manually labeled training data. In this work, we validate our framework on six benchmark tasks and demonstrate Trove's ability to analyze the records of patients visiting the emergency department at Stanford Health Care for COVID-19 presenting symptoms and risk factors.
翻译:在电子健康记录中,利用临床说明来确定疾病等实体及其时间性(例如,事件与时间指数的顺序)可以作为许多重要分析的参考。然而,为临床实体任务创建培训数据是耗费时间和共享标签数据,由于隐私问题,因此具有挑战性。COVID-19大流行病的信息需求突出表明,需要有灵活的培训机器学习模式的临床记录模式培训方法。我们提出了Trove,这是一个使用医学本体和专家产生的规则进行监督不力的实体分类的框架。我们的方法与手贴的注释不同,容易分享和修改,同时提供与人工标签培训数据学习相类似的业绩。在这项工作中,我们验证了我们在六项基准任务方面的框架,并表明Trove有能力分析到斯坦福卫生局急诊部门治疗COVID-19显示症状和风险因素的病人记录。