Acquisition of multilingual training data continues to be a challenge in word sense disambiguation (WSD). To address this problem, unsupervised approaches have been proposed to automatically generate sense annotations for training supervised WSD systems. We present three new methods for creating sense-annotated corpora which leverage translations, parallel bitexts, lexical resources, as well as contextual and synset embeddings. Our semi-supervised method applies machine translation to transfer existing sense annotations to other languages. Our two unsupervised methods refine sense annotations produced by a knowledge-based WSD system via lexical translations in a parallel corpus. We obtain state-of-the-art results on standard WSD benchmarks.
翻译:获得多语种培训数据仍然是文字上的一个挑战。为了解决这一问题,我们提议了未经监督的方法,以自动为受监督的WSD系统的培训产生感知说明。我们提出了三种新的方法,用以创建有说明的感知公司,利用翻译、平行的位数、词汇资源以及背景和合成嵌入。我们的半监督方法使用机器翻译将现有的感知说明转移到其他语言。我们的两个未经监督的方法通过平行的文体的基于知识的WSD翻译系统改进感知说明。我们获得了标准的WSD基准的最新结果。