Semantic textual similarity (STS) in the clinical domain helps improve diagnostic efficiency and produce concise texts for downstream data mining tasks. However, given the high degree of domain knowledge involved in clinic text, it remains challenging for general language models to infer implicit medical relationships behind clinical sentences and output similarities correctly. In this paper, we present a graph-augmented cyclic learning framework for similarity estimation in the clinical domain. The framework can be conveniently implemented on a state-of-art backbone language model, and improve its performance by leveraging domain knowledge through co-training with an auxiliary graph convolution network (GCN) based network. We report the success of introducing domain knowledge in GCN and the co-training framework by improving the Bio-clinical BERT baseline by 16.3% and 27.9%, respectively.
翻译:临床领域的语义文字相似性(STS)有助于提高诊断效率,并为下游数据挖掘任务提供简明文本,然而,鉴于临床文本涉及高度的域知识,一般语言模型仍难以正确地推断临床判决和产出相似性背后的隐性医学关系。本文介绍了临床领域类似性估算的图示强化循环学习框架。该框架可以方便地在最先进的主干语言模型上实施,并通过与辅助性图解共进网络网络(GCN)进行共同培训,利用域知识来改进其绩效。我们报告,通过将生物临床生物科学研究基准分别提高16.3%和27.9%,从而成功地将域知识引入GCN和联合培训框架。