In a recent paper, Levy and Goldberg pointed out an interesting connection between prediction-based word embedding models and count models based on pointwise mutual information. Under certain conditions, they showed that both models end up optimizing equivalent objective functions. This paper explores this connection in more detail and lays out the factors leading to differences between these models. We find that the most relevant differences from an optimization perspective are (i) predict models work in a low dimensional space where embedding vectors can interact heavily; (ii) since predict models have fewer parameters, they are less prone to overfitting. Motivated by the insight of our analysis, we show how count models can be regularized in a principled manner and provide closed-form solutions for L1 and L2 regularization. Finally, we propose a new embedding model with a convex objective and the additional benefit of being intelligible.
翻译:在最近的一篇论文中,利维和戈德贝格指出,基于预测的嵌入词汇模型和基于点点信息计算模型之间有着令人感兴趣的联系,在某些条件下,它们表明这两种模型最终都会优化相应的客观功能。本文更详细地探讨了这一联系,并列出了导致这些模型之间差异的因素。我们从优化角度发现,最相关的差异是:(一) 预测模型在低维空间发挥作用,嵌入病媒可以产生大量互动;(二) 由于预测模型的参数较少,因此它们不易过度适应。我们的分析见解激励我们,我们展示了如何以有原则的方式将计算模型正规化,并为L1和L2正规化提供封闭式解决方案。最后,我们提出了一个新的嵌入模型,其目标为连接,其额外的好处是不易理解。