Vector-based word representations help countless Natural Language Processing (NLP) tasks capture both semantic and syntactic regularities of the language. In this paper, we present the characteristics of existing word embedding approaches and analyze them with regards to many classification tasks. We categorize the methods into two main groups - Traditional approaches mostly use matrix factorization to produce word representations, and they are not able to capture the semantic and syntactic regularities of the language very well. Neural-Network based approaches, on the other hand, can capture sophisticated regularities of the language and preserve the word relationships in the generated word representations. We report experimental results on multiple classification tasks and highlight the scenarios where one approach performs better than the rest.
翻译:基于矢量的字表示法有助于无数自然语言处理(NLP)的任务,既能捕捉语言的语义性,又能捕捉语言的共性。在本文中,我们介绍了现有语言嵌入方法的特点,并在许多分类任务方面分析了这些特点。我们将这些方法分为两大类:传统方法大多使用矩阵化来生成语言表达法,它们无法很好地捕捉语言的语义性和共性。另一方面,基于神经网络的方法可以捕捉语言的复杂规律,并在生成的字义表示法中保存文字关系。我们报告关于多重分类任务的实验结果,并突出一种方法比其他方法更好的情景。</s>