This paper presents a multilingual study of word meaning representations in context. We assess the ability of both static and contextualized models to adequately represent different lexical-semantic relations, such as homonymy and synonymy. To do so, we created a new multilingual dataset that allows us to perform a controlled evaluation of several factors such as the impact of the surrounding context or the overlap between words, conveying the same or different senses. A systematic assessment on four scenarios shows that the best monolingual models based on Transformers can adequately disambiguate homonyms in context. However, as they rely heavily on context, these models fail at representing words with different senses when occurring in similar sentences. Experiments are performed in Galician, Portuguese, English, and Spanish, and both the dataset (with more than 3,000 evaluation items) and new models are freely released with this study.
翻译:本文对背景中的文字含义表示方式进行了多语种研究。 我们评估了静态和背景化模型是否有能力充分代表不同词汇-语系关系,例如同义和同义。 为此,我们创建了一个新的多语种数据集,使我们能够对周围环境的影响或词际重叠等若干因素进行有控制的评估,表达同样或不同的含义。对四种情景的系统评估表明,基于变异器的最佳单一语言模型能够充分排除背景中的同义词。然而,由于这些模型在很大程度上依赖上下文,因此在类似句子中无法代表不同含义的词句。实验用加利西亚语、葡萄牙语、英语和西班牙语进行,同时将数据集(3 000多个评价项目)和新模型随本研究自由发布。