Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT as well as other short-sequence transformers in all downstream tasks. We have made the pre-trained models available for public download at: [https://huggingface.co/yikuan8/Clinical-Longformer].
翻译:临床知识丰富模型,即临床BERT,在临床名称实体识别和自然语言推断任务中也取得了最新的结果。这些变压器的核心局限性之一是由于完全自留机制而大量消耗记忆力。为了克服这一缺陷,提出了长序变压器模型,如长序变压器和大布尔德等,其想法是缺乏关注机制,将记忆用量从四级减到序列长到线性尺度。这些模型将最大输入序列长度从512年延长到4096年,这提高了长期依赖性模型的建模能力,从而在各种任务中取得了最佳结果。受这些长序变压器模型成功的影响,我们引入了两个领域强化语言模型,即临床-长序变压器和临床-BigBird模型,这些模型从大型临床公司培训前就已经使用过。我们用十项基准任务评估过两种模型,包括命名的实体识别模型、问题解答、以及所有短期变压机前的临床和文件升级机组任务。我们一直将临床结果显示临床作为临床和升级前的短期变压。