We use paraphrases as a unique source of data to analyze contextualized embeddings, with a particular focus on BERT. Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings. Using the Paraphrase Database's alignments, we study words within paraphrases as well as phrase representations. We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases. We confirm previous findings that BERT is sensitive to word order, but find slightly different patterns than prior work in terms of the level of contextualization across BERT's layers.
翻译:我们使用副词句作为独特的数据来源来分析背景化嵌入,特别侧重于BERT。由于副词句自然地将一致的词句和语义编码为编码,它们为调查嵌入的特性提供了一个独特的透镜。我们使用副词句数据库的对齐,在副词句和语句表达中研究词句。我们发现,背景嵌入有效地处理了多种词,但在许多情况下给出了惊人的同义词表达方式。我们确认先前的发现, BERT对文字顺序很敏感,但在BERT各层背景化程度方面发现的模式与以前的工作略有不同。