Recent advances on the Vector Space Model have significantly improved some NLP applications such as neural machine translation and natural language generation. Although word co-occurrences in context have been widely used in counting-/predicting-based distributional models, the role of syntactic dependencies in deriving distributional semantics has not yet been thoroughly investigated. By comparing various Vector Space Models in detecting synonyms in TOEFL, we systematically study the salience of syntactic dependencies in accounting for distributional similarity. We separate syntactic dependencies into different groups according to their various grammatical roles and then use context-counting to construct their corresponding raw and SVD-compressed matrices. Moreover, using the same training hyperparameters and corpora, we study typical neural embeddings in the evaluation. We further study the effectiveness of injecting human-compiled semantic knowledge into neural embeddings on computing distributional similarity. Our results show that the syntactically conditioned contexts can interpret lexical semantics better than the unconditioned ones, whereas retrofitting neural embeddings with semantic knowledge can significantly improve synonym detection.
翻译:矢量空间模型最近的进展大大改进了一些 NLP 应用,如神经机翻译和自然语言生成等。虽然在计算/预测基于分布模型时广泛使用了相关词共生现象,但尚未彻底调查合成依赖性在衍生分布性语义学方面的作用。通过比较各种矢量空间模型在检测TOEFL同名词方面的特征,我们系统地研究了在计算分布性相似性时,合成依赖性在计算分布性时的特征。我们根据不同组的不同语法作用将同系物依赖性区分为不同组别,然后使用背景计算来构建相应的原始和SVD压缩的矩阵。此外,我们利用同样的培训超光谱和囊肿,在评估中研究典型的神经嵌入。我们进一步研究了在计算分布性相似性上将人类相融合性知识注入神经嵌入神经嵌入的效果。我们的研究结果显示,同系物性环境可以更好地解释其对应的原始和SVD压矩阵的矩阵。此外,我们使用同样的培训超光度和囊体外观,我们进一步研究将人类相融合性知识注入在计算中的效果。