改进变形时代的生物医学文字嵌入 (Improved Biomedical Word Embeddings in the Transformer Era)

Biomedical word embeddings are usually pre-trained on free text corpora with neural methods that capture local and global distributional properties. They are leveraged in downstream tasks using various neural architectures that are designed to optimize task-specific objectives that might further tune such embeddings. Since 2018, however, there is a marked shift from these static embeddings to contextual embeddings motivated by language models (e.g., ELMo, transformers such as BERT, and ULMFiT). These dynamic embeddings have the added benefit of being able to distinguish homonyms and acronyms given their context. However, static embeddings are still relevant in low resource settings (e.g., smart devices, IoT elements) and to study lexical semantics from a computational linguistics perspective. In this paper, we jointly learn word and concept embeddings by first using the skip-gram method and further fine-tuning them with correlational information manifesting in co-occurring Medical Subject Heading (MeSH) concepts in biomedical citations. This fine-tuning is accomplished with the BERT transformer architecture in the two-sentence input mode with a classification objective that captures MeSH pair co-occurrence. In essence, we repurpose a transformer architecture (typically used to generate dynamic embeddings) to improve static embeddings using concept correlations. We conduct evaluations of these tuned static embeddings using multiple datasets for word relatedness developed by previous efforts. Without selectively culling concepts and terms (as was pursued by previous efforts), we believe we offer the most exhaustive evaluation of static embeddings to date with clear performance improvements across the board. We provide our code and embeddings for public use for downstream applications and research endeavors: https://github.com/bionlproc/BERT-CRel-Embeddings

翻译：生物医学嵌入词通常是在自由文本公司上预先训练的,这些动态嵌入法具有神经功能,可以捕捉本地和全球分布属性。它们被用于下游任务,使用各种神经结构,这些结构旨在优化特定任务的目标,以进一步调控嵌入。然而,自2018年以来,从这些静态嵌入到由语言模型(例如ELMO、BERT等变压器和ULMFiT)驱动的背景嵌入,发生了明显的变化。这些动态嵌入增加了以下好处:能够区分本地和全球分布属性。但是,固定嵌入在下游任务中使用了各种神经结构,这些结构在低资源设置(例如智能设备、 IoT 元素)中仍然具有相关性,并且从计算语言语言学角度来研究词汇和概念嵌入。我们首先使用跳格方法,然后用相关信息进一步调整它们。我们通过生物医学直线(MESH) 改进概念,在生物伦理引用中,这种微缩嵌入与BERGreal-deal努力在使用先前的直流结构结构中生成了一种前的直流数据。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

临床自然语言处理中的嵌入综述，SECNLP: A survey of embeddings

专知会员服务

38+阅读 · 2020年3月23日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【Amazon】使用预先训练的Transformer模型进行数据增强，Data Augmentation using Pre-trained Transformer Models

专知会员服务

51+阅读 · 2020年3月7日