Natural Language Understanding has seen an increasing number of publications in the last few years, especially after robust word embeddings models became prominent, when they proved themselves able to capture and represent semantic relationships from massive amounts of data. Nevertheless, traditional models often fall short in intrinsic issues of linguistics, such as polysemy and homonymy. Any expert system that makes use of natural language in its core, can be affected by a weak semantic representation of text, resulting in inaccurate outcomes based on poor decisions. To mitigate such issues, we propose a novel approach called Most Suitable Sense Annotation (MSSA), that disambiguates and annotates each word by its specific sense, considering the semantic effects of its context. Our approach brings three main contributions to the semantic representation scenario: (i) an unsupervised technique that disambiguates and annotates words by their senses, (ii) a multi-sense embeddings model that can be extended to any traditional word embeddings algorithm, and (iii) a recurrent methodology that allows our models to be re-used and their representations refined. We test our approach on six different benchmarks for the word similarity task, showing that our approach can produce state-of-the-art results and outperforms several more complex state-of-the-art systems.
翻译:在过去几年里,自然语言理解组织看到越来越多的出版物,特别是在强势的字嵌入模型变得显眼之后,特别是在强势的字嵌入模型被证明能够从大量数据中捕捉和代表语义关系之后,尽管如此,传统模型往往在语言的内在问题,如多语系和同性之间不尽如人意。任何在其核心中使用自然语言的专家系统,都可能受到文字语义表述薄弱的影响,造成基于错误决定的不准确结果。为了缓解这些问题,我们提议采用一种新颖的方法,称为 " 最合适的感知批注 " (MSSA),以其具体意义来模糊和注解每个词,考虑到其语义效应。我们的方法为语义表述假设情景带来了三大主要贡献:(一) 一种不受监督的技术,其核心部分使用自然语言,其语言的语义表达方式可能受到语义描述不准确的影响,(二) 一种多语调的嵌入模式,可扩展至任何传统的词嵌入算法,以及(三)一种经常采用的方法,使我们的模型能够被重新使用,其表达方式的具体意义,并用其语义表达方式得到完善。我们的方法,我们用六种不同的系统,我们检验了一种不相近式方法,我们的方法可以产生出一种不同的结果。