叶状AI：临床队列发现的询问生成器，可以与人类程序员竞争 (LeafAI: query generator for clinical cohort discovery rivaling a human programmer)

Objective: Identifying study-eligible patients within clinical databases is a critical step in clinical research. However, accurate query design typically requires extensive technical and biomedical expertise. We sought to create a system capable of generating data model-agnostic queries while also providing novel logical reasoning capabilities for complex clinical trial eligibility criteria. Materials and Methods: The task of query creation from eligibility criteria requires solving several text-processing problems, including named entity recognition and relation extraction, sequence-to-sequence transformation, normalization, and reasoning. We incorporated hybrid deep learning and rule-based modules for these, as well as a knowledge base of the Unified Medical Language System (UMLS) and linked ontologies. To enable data-model agnostic query creation, we introduce a novel method for tagging database schema elements using UMLS concepts. To evaluate our system, called LeafAI, we compared the capability of LeafAI to a human database programmer to identify patients who had been enrolled in 8 clinical trials conducted at our institution. We measured performance by the number of actual enrolled patients matched by generated queries. Results: LeafAI matched a mean 43% of enrolled patients with 27,225 eligible across 8 clinical trials, compared to 27% matched and 14,587 eligible in queries by a human database programmer. The human programmer spent 26 total hours crafting queries compared to several minutes by LeafAI. Conclusions: Our work contributes a state-of-the-art data model-agnostic query generation system capable of conditional reasoning using a knowledge base. We demonstrate that LeafAI can rival a human programmer in finding patients eligible for clinical trials.

翻译：目的：在临床数据库中识别符合研究条件的患者是临床研究的关键步骤。然而，准确的查询设计通常需要广泛的技术和生物医学专业知识。我们试图创建一个系统，能够生成数据模型无关的查询，同时为复杂的临床试验合格标准提供新颖的逻辑推理能力。材料和方法：从合格标准创建查询的任务需要解决几个文本处理问题，包括命名实体识别和关系提取，序列到序列转换，规范化和推理。我们结合了深度学习和基于规则的模块用于这些问题，以及统一医学语言系统（UMLS）和联接本体论的知识库。为了实现数据模型无关的查询创建，我们介绍了一种使用UMLS概念标记数据库模式元素的新方法。为了评估我们的系统LeafAI，我们比较了LeafAI与一个人类数据库程序员识别我们机构开展的8项临床试验中已入组患者的能力。我们通过生成的查询匹配到的实际入组患者数量来衡量性能。结果：在8个临床试验中，LeafAI匹配了平均43%的入组患者，总共有27225个合格患者，而人类数据库程序员在14,587个合格患者中匹配了27％。与几分钟的时间相比，人类程序员花费了总计26个小时来制定查询。结论：我们的工作为最先进的数据模型无关的查询生成系统做出了贡献，具有使用知识库进行条件推理的能力。我们展示了LeafAI可以与人类程序员竞争，找到符合临床试验条件的患者。