Multilingual Word Embeddings (MWEs) represent words from multiple languages in a single distributional vector space. Unsupervised MWE (UMWE) methods acquire multilingual embeddings without cross-lingual supervision, which is a significant advantage over traditional supervised approaches and opens many new possibilities for low-resource languages. Prior art for learning UMWEs, however, merely relies on a number of independently trained Unsupervised Bilingual Word Embeddings (UBWEs) to obtain multilingual embeddings. These methods fail to leverage the interdependencies that exist among many languages. To address this shortcoming, we propose a fully unsupervised framework for learning MWEs that directly exploits the relations between all language pairs. Our model substantially outperforms previous approaches in the experiments on multilingual word translation and cross-lingual word similarity. In addition, our model even beats supervised approaches trained with cross-lingual resources.
翻译:多语言嵌入器(MWE)是一个单一分布矢量空间中来自多种语言的单词。不受监督的MWE(UMWE)方法在没有跨语言监督的情况下获得多语言嵌入器,这是传统监督方法的一大优势,为低资源语言开辟了许多新的可能性。但是,学习多语言嵌入器(MWE)的先行艺术仅仅依靠一些经过独立培训的、未经监督的双语嵌入器(UBWE)来获得多语言嵌入器。这些方法未能利用多种语言之间的相互依存关系。为了解决这一缺陷,我们提出了一个完全不受监督的学习多语言嵌入器框架,直接利用所有语言对口之间的关系。我们的模型大大优于在多语言翻译和跨语言词汇相似性实验中以往的方法。此外,我们的模型甚至比受过跨语言资源培训的受监督的方法。