Identifying cohorts of patients based on eligibility criteria such as medical conditions, procedures, and medication use is critical to recruitment for clinical trials. Such criteria are often most naturally described in free-text, using language familiar to clinicians and researchers. In order to identify potential participants at scale, these criteria must first be translated into queries on clinical databases, which can be labor-intensive and error-prone. Natural language processing (NLP) methods offer a potential means of such conversion into database queries automatically. However they must first be trained and evaluated using corpora which capture clinical trials criteria in sufficient detail. In this paper, we introduce the Leaf Clinical Trials (LCT) corpus, a human-annotated corpus of over 1,000 clinical trial eligibility criteria descriptions using highly granular structured labels capturing a range of biomedical phenomena. We provide details of our schema, annotation process, corpus quality, and statistics. Additionally, we present baseline information extraction results on this corpus as benchmarks for future work.
翻译:根据医疗条件、程序和药物使用等合格标准确定患者组群对于临床试验的招聘至关重要。这类标准通常在免费文本中最自然地描述,使用临床医生和研究人员熟悉的语言。为了在规模上确定潜在的参与者,必须首先将这些标准转化为临床数据库的查询,而临床数据库可以是劳力密集型的,容易出错。自然语言处理方法是将这种查询自动转换成数据库的潜在手段。然而,它们必须首先使用足够详细的临床试验标准来进行训练和评估。在本文件中,我们引入了利夫临床试验(LCT)材料,这是一个包含1 000多个临床试验标准说明的人类附加说明材料,使用高度颗粒结构的标签,捕捉一系列生物医学现象。我们提供了我们的计划、批注过程、材料质量和统计等细节。此外,我们介绍了这一材料的基线信息提取结果,作为今后工作的基准。