One of the central aspects of contextualised language models is that they should be able to distinguish the meaning of lexically ambiguous words by their contexts. In this paper we investigate the extent to which the contextualised embeddings of word forms that display multiplicity of sense reflect traditional distinctions of polysemy and homonymy. To this end, we introduce an extended, human-annotated dataset of graded word sense similarity and co-predication acceptability, and evaluate how well the similarity of embeddings predicts similarity in meaning. Both types of human judgements indicate that the similarity of polysemic interpretations falls in a continuum between identity of meaning and homonymy. However, we also observe significant differences within the similarity ratings of polysemes, forming consistent patterns for different types of polysemic sense alternation. Our dataset thus appears to capture a substantial part of the complexity of lexical ambiguity, and can provide a realistic test bed for contextualised embeddings. Among the tested models, BERT Large shows the strongest correlation with the collected word sense similarity ratings, but struggles to consistently replicate the observed similarity patterns. When clustering ambiguous word forms based on their embeddings, the model displays high confidence in discerning homonyms and some types of polysemic alternations, but consistently fails for others.
翻译:本地语言模型的核心方面之一是,它们应该能够根据背景来区分字典上模棱两可的字词的含义。在本文中,我们调查显示多种感知的字形背景嵌入在多大程度上反映了多元感和同系的传统的区别。为此,我们引入了一套由分级感的相近性和共同预测可接受性组成的扩展的、人文附加说明的数据集,并评估嵌入的相似性在意义上如何预测相近性。两种人类判断都表明,多元性解释的相似性存在于含义和同系性特征之间的连续体。然而,我们也观察到多元感的相似性评级中存在重大差异,形成不同类型多重感感感变的一致模式。因此,我们的数据集似乎可以捕捉到语言模糊性复杂程度的很大一部分,并为背景化嵌入的嵌入提供现实的测试床位。在经过测试的模型中,BERT Olong显示了与所收集的词义相似性评级最密切的关联性,但努力不断地复制所观察到的相似性模式。当将模糊性单词形式组合在一起时,但以高清晰度显示其他类型的多感变。