The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgment that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce the cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, DR.BENCH, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models. Experiments with state-of-the-art pre-trained generative language models using large general domain models and models that were continually trained on a medical corpus demonstrate opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community.
翻译:电子健康记录(EHR)的有意义使用在数字时代继续取得进展,临床决策支持系统得到了人工智能的加强,改进提供者经验的一个优先事项是克服信息超载,减少认知负担,从而减少病人护理期间出现医疗错误和认知偏差。主要的医疗错误之一是诊断错误,因为判断中有系统或可预测的错误,而这种错误依赖休眠症。临床自然语言处理(cNLP)的潜力是人体诊断推理模型,从从数据到诊断的推理到可能减少认知负担和医疗错误。现有的推进CNLP科学的任务主要侧重于信息提取和通过分类任务命名实体识别。我们引入了一套新颖的任务,称为诊断判断性判断性判断基准,DR.BENCH,作为开发和评价具有临床诊断推理能力的CNLP模型的新基准。这套任务包括从公开提供的10套数据到临床文本理解、医学知识推理和诊断生成的理论。DR.B.ENCH是第一个设计为B-语言生成前的临床成本模型的临床组合组合,在经过不断培训的模型中,用经过不断培训的模型来评估。