临床和临床大鸟:长临床序列变形器 (Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences)

Transformers-based models, such as BERT, have dramatically improved the performance for various natural language processing tasks. The clinical knowledge enriched model, namely ClinicalBERT, also achieved state-of-the-art results when performed on clinical named entity recognition and natural language inference tasks. One of the core limitations of these transformers is the substantial memory consumption due to their full self-attention mechanism. To overcome this, long sequence transformer models, e.g. Longformer and BigBird, were proposed with the idea of sparse attention mechanism to reduce the memory usage from quadratic to the sequence length to a linear scale. These models extended the maximum input sequence length from 512 to 4096, which enhanced the ability of modeling long-term dependency and consequently achieved optimal results in a variety of tasks. Inspired by the success of these long sequence transformer models, we introduce two domain enriched language models, namely Clinical-Longformer and Clinical-BigBird, which are pre-trained from large-scale clinical corpora. We evaluate both pre-trained models using 10 baseline tasks including named entity recognition, question answering, and document classification tasks. The results demonstrate that Clinical-Longformer and Clinical-BigBird consistently and significantly outperform ClinicalBERT as well as other short-sequence transformers in all downstream tasks. We have made our source code available at [https://github.com/luoyuanlab/Clinical-Longformer] the pre-trained models available for public download at: [https://huggingface.co/yikuan8/Clinical-Longformer].

翻译：临床知识丰富模型,即临床BERT,在临床名称实体识别和自然语言推断任务中也取得了最先进的结果。这些变压器的核心局限之一是由于完全自省机制而大量消耗记忆力。为了克服这一缺陷,提出了长序变压器模型,如长序变压器和大布尔德等,其想法是缺乏关注机制,将记忆用量从二次曲线到序列长度减少到线性尺度。这些模型将最大输入序列的长度从512年延长到4096年,这提高了长期依赖性模型的建模能力,从而在各种任务中取得了最佳结果。受这些长序变压器模型成功的影响,我们引入了两个领域强化语言模型,即临床变压器和临床-BigBird模型,这些模型从大规模临床公司培训前就已经使用过。我们用十项基线任务评估了前两个模型,包括名称实体识别、问题解答、以及所有短期变压机的临床和纸浆机。临床结果显示,我们现有短期的临床/临床/临床。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日