State-of-the-art methods for Word Sense Disambiguation (WSD) combine two different features: the power of pre-trained language models and a propagation method to extend the coverage of such models. This propagation is needed as current sense-annotated corpora lack coverage of many instances in the underlying sense inventory (usually WordNet). At the same time, unambiguous words make for a large portion of all words in WordNet, while being poorly covered in existing sense-annotated corpora. In this paper, we propose a simple method to provide annotations for most unambiguous words in a large corpus. We introduce the UWA (Unambiguous Word Annotations) dataset and show how a state-of-the-art propagation-based model can use it to extend the coverage and quality of its word sense embeddings by a significant margin, improving on its original results on WSD.
翻译:Word Sense Dismendation (WSD) 的最新方法结合了两个不同的特征:预先训练的语言模型和传播方法对扩大这些模型覆盖面的影响力。 需要这种传播, 因为当前的感知附加说明公司在基本意义清单( 通常是WordNet)中缺乏对许多实例的涵盖范围。 同时, 毫不含糊的字词在WordNet中占据了全部词的很大一部分, 而在现有的感知附加说明公司中则没有很好覆盖。 在本文中, 我们提出了一个简单的方法, 为大体中最明确的单词提供说明。 我们引入了UWA( 模糊的Word 说明) 数据集, 并展示了一种最先进的基于传播的模型如何使用它来扩大其文字感知的覆盖面和质量, 大幅拉长, 改进了它关于 WSD 的原始结果 。