A widely acknowledged shortcoming of WordNet is that it lacks a distinction between word meanings which are systematically related (polysemy), and those which are coincidental (homonymy). Several previous works have attempted to fill this gap, by inferring this information using computational methods. We revisit this task, and exploit recent advances in language modelling to synthesise homonymy annotation for Princeton WordNet. Previous approaches treat the problem using clustering methods; by contrast, our method works by linking WordNet to the Oxford English Dictionary, which contains the information we need. To perform this alignment, we pair definitions based on their proximity in an embedding space produced by a Transformer model. Despite the simplicity of this approach, our best model attains an F1 of .97 on an evaluation set that we annotate. The outcome of our work is a high-quality homonymy annotation layer for Princeton WordNet, which we release.
翻译:WordNet的一个普遍公认的缺点是,它缺乏系统性关联(Pollysemy)的词义与同时(homonymy)的词义之间的区别。 前几部著作试图通过使用计算方法推算这一信息来填补这一差距。 我们重新审视了这项任务,并利用语言建模方面的最新进展来合成普林斯顿WordNet的同性笔记。 以往的方法使用群集方法处理问题; 相反,我们的方法是将WordNet与载有我们需要的信息的牛津英语词典连接起来。 为了进行这种对齐,我们根据它们与由变异器模型生成的嵌入空间的距离来对定义进行对齐。 尽管这一方法很简单,但我们的最佳模型在我们作注释的评价集上达到了9.97的F1。我们的工作成果是普林斯顿WordNet的高质量同性笔记。