神经文字嵌入特征调查 (A Survey On Neural Word Embeddings)

Understanding human language has been a sub-challenge on the way of intelligent machines. The study of meaning in natural language processing (NLP) relies on the distributional hypothesis where language elements get meaning from the words that co-occur within contexts. The revolutionary idea of distributed representation for a concept is close to the working of a human mind in that the meaning of a word is spread across several neurons, and a loss of activation will only slightly affect the memory retrieval process. Neural word embeddings transformed the whole field of NLP by introducing substantial improvements in all NLP tasks. In this survey, we provide a comprehensive literature review on neural word embeddings. We give theoretical foundations and describe existing work by an interplay between word embeddings and language modelling. We provide broad coverage on neural word embeddings, including early word embeddings, embeddings targeting specific semantic relations, sense embeddings, morpheme embeddings, and finally, contextual representations. Finally, we describe benchmark datasets in word embeddings' performance evaluation and downstream tasks along with the performance results of/due to word embeddings.

翻译：理解人类语言是智能机器道路上的一个小挑战。自然语言处理( NLP) 的含义研究依赖于分布假设, 语言元素在分布假设中从背景中共同出现的单词中产生意义。一个概念分布代表的革命理念接近于人类思想的作用, 单词的含义散布于多个神经元中, 激活的丧失只会略微影响记忆检索过程。内字嵌入通过引入大量改进所有 NLP 任务来改变整个 NLP 领域。在这次调查中, 我们提供了关于神经字嵌入的全面文献审查。我们提供理论基础, 并通过文字嵌入和语言建模之间的相互作用来描述现有工作。我们对神经字嵌入, 包括早期的单词嵌入, 嵌入针对特定语系关系、感嵌入感、隐蔽嵌入, 最后是背景表达。最后, 我们描述了文字嵌入性绩效/ 嵌入词的下游任务的基准数据集。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。