Distributed representations of words encode lexical semantic information, but how is that information encoded in word embeddings? Focusing on the skip-gram with negative-sampling method, we show theoretically and experimentally that the squared norm of word embedding encodes the information gain defined by the Kullback-Leibler divergence of the co-occurrence distribution of a word to the unigram distribution of the corpus. Furthermore, through experiments on tasks of keyword extraction, hypernym prediction, and part-of-speech discrimination, we confirmed that the KL divergence and the squared norm of embedding work as a measure of the informativeness of a word provided that the bias caused by word frequency is adequately corrected.
翻译:在理论上和实验上,我们用否定抽样方法聚焦于跳过格法,从理论上和实验上显示,单词嵌入的正方规范将Kullback-Libeler在共同分发单词到单词分布上的差异所定义的信息收益编码成正方形规范。此外,通过关键词提取、超nym预测和部分语音歧视等任务实验,我们确认,KL差异和嵌入工作的正方形规范是衡量一个单词的丰富性的尺度,只要字数频率造成的偏差得到充分纠正。