Objective: To develop a natural language processing system that solves both clinical concept extraction and relation extraction in a unified prompt-based machine reading comprehension (MRC) architecture with good generalizability for cross-institution applications. Methods: We formulate both clinical concept extraction and relation extraction using a unified prompt-based MRC architecture and explore state-of-the-art transformer models. We compare our MRC models with existing deep learning models for concept extraction and end-to-end relation extraction using two benchmark datasets developed by the 2018 National NLP Clinical Challenges (n2c2) challenge (medications and adverse drug events) and the 2022 n2c2 challenge (relations of social determinants of health [SDoH]). We also evaluate the transfer learning ability of the proposed MRC models in a cross-institution setting. We perform error analyses and examine how different prompting strategies affect the performance of MRC models. Results and Conclusion: The proposed MRC models achieve state-of-the-art performance for clinical concept and relation extraction on the two benchmark datasets, outperforming previous non-MRC transformer models. GatorTron-MRC achieves the best strict and lenient F1-scores for concept extraction, outperforming previous deep learning models on the two datasets by 1%~3% and 0.7%~1.3%, respectively. For end-to-end relation extraction, GatorTron-MRC and BERT-MIMIC-MRC achieve the best F1-scores, outperforming previous deep learning models by 0.9%~2.4% and 10%-11%, respectively. For cross-institution evaluation, GatorTron-MRC outperforms traditional GatorTron by 6.4% and 16% for the two datasets, respectively. The proposed method is better at handling nested/overlapped concepts, extracting relations, and has good portability for cross-institute applications.
翻译:目标:开发一个自然语言处理系统,在一个统一的快速机器阅读理解(MRC)架构中解决临床概念提取和关系提取,对跨机构应用具有良好的通用性。方法:我们使用统一的快速MRC架构制定临床概念提取和关系提取,并探索最先进的变压器模型。我们用2018年国家NLP临床挑战(n2c2)所开发的两个基准数据集,解决临床概念提取和关系提取,对一个基于统一快速的机器阅读理解(MRC)架构(MRC)架构进行统一解决,对跨机构应用采用。方法:我们使用统一的基于快速的MRC架构构建临床概念提取和关系(MRC),对2022年的国家NLP临床挑战(N2c2)挑战(药品和不良药物事件)和2022年的NC2c2挑战(健康社会决定因素的对比)进行临床概念提取。我们还在跨机构的设置中评估了拟议的MRC模型的转移学习能力。我们进行错误分析,并研究不同的快速战略如何影响MRC模型的绩效。结果和结论:拟议的MRC模型通过两种基准数据集的临床概念,对临床概念的临床概念的临床概念和深度分析,对临床模型的深度评估结果的深度评估,比前两个模型的精确模型的精确模型的精确模型的精确模型,对两个模型的精确的学习方法分别进行业绩的计算,对前者和精确的计算。</s>