Learning a parametric model of a data distribution is a well-known statistical problem that has seen renewed interest as it is brought to scale in deep learning. Framing the problem as a self-supervised task, where data samples are discriminated from noise samples, is at the core of state-of-the-art methods, beginning with Noise-Contrastive Estimation (NCE). Yet, such contrastive learning requires a good noise distribution, which is hard to specify; domain-specific heuristics are therefore widely used. While a comprehensive theory is missing, it is widely assumed that the optimal noise should in practice be made equal to the data, both in distribution and proportion. This setting underlies Generative Adversarial Networks (GANs) in particular. Here, we empirically and theoretically challenge this assumption on the optimal noise. We show that deviating from this assumption can actually lead to better statistical estimators, in terms of asymptotic variance. In particular, the optimal noise distribution is different from the data's and even from a different family.
翻译:研究数据分布的参数模型是一个众所周知的统计问题,在深入的学习中,人们对此又重新产生了兴趣。将这一问题作为自我监督的任务,数据样本与噪音样本区分开来,这是最先进方法的核心,从噪音-和谐估计开始。然而,这种对比性学习需要良好的噪音分布,很难具体说明;因此,广泛使用特定域的超自然现象。虽然缺少全面理论,但人们普遍认为,在实际中,最佳噪音应当与数据平等,无论是在分布还是比例上。这种设置尤其以基因自动网络为基础。在这里,我们从实验和理论上质疑关于最佳噪音的假设。我们表明,偏离这一假设实际上可以导致更好的统计估计,从无症状差异的角度讲。特别是,最佳噪音分布与数据不同,甚至与不同家庭不同。