Noise Contrastive Estimation (NCE) is a popular approach for learning probability density functions parameterized up to a constant of proportionality. The main idea is to design a classification problem for distinguishing training data from samples from an easy-to-sample noise distribution $q$, in a manner that avoids having to calculate a partition function. It is well-known that the choice of $q$ can severely impact the computational and statistical efficiency of NCE. In practice, a common choice for $q$ is a Gaussian which matches the mean and covariance of the data. In this paper, we show that such a choice can result in an exponentially bad (in the ambient dimension) conditioning of the Hessian of the loss, even for very simple data distributions. As a consequence, both the statistical and algorithmic complexity for such a choice of $q$ will be problematic in practice, suggesting that more complex noise distributions are essential to the success of NCE.
翻译:噪音对比估计(NCE)是一种常见的学习概率密度函数的方法,该方法的参数是成比例的参数,主要的想法是设计一个分类问题,将培训数据与样本区分开来,使其与易于抽样的噪音分布成美元,避免计算分割函数;众所周知,选择美元会严重影响NCE的计算和统计效率。在实践中,对美元的共同选择是符合数据的平均值和共变的高斯语。在本文中,我们表明,这种选择可能导致损失的赫西人极差(在环境层面)的条件,即使是在非常简单的数据分布方面也是如此。因此,选择美元在统计和算法上的复杂性在实践中都会产生问题,表明更为复杂的噪音分布对于NCE的成功至关重要。</s>