The modeling of probability distributions, specifically generative modeling and density estimation, has become an immensely popular subject in recent years by virtue of its outstanding performance on sophisticated data such as images and texts. Nevertheless, a theoretical understanding of its success is still incomplete. One mystery is the paradox between memorization and generalization: In theory, the model is trained to be exactly the same as the empirical distribution of the finite samples, whereas in practice, the trained model can generate new samples or estimate the likelihood of unseen samples. Likewise, the overwhelming diversity of distribution learning models calls for a unified perspective on this subject. This paper provides a mathematical framework such that all the well-known models can be derived based on simple principles. To demonstrate its efficacy, we present a survey of our results on the approximation error, training error and generalization error of these models, which can all be established based on this framework. In particular, the aforementioned paradox is resolved by proving that these models enjoy implicit regularization during training, so that the generalization error at early-stopping avoids the curse of dimensionality. Furthermore, we provide some new results on landscape analysis and the mode collapse phenomenon.
翻译:概率分布模型,特别是基因模型和密度估计,近年来由于其在图像和文本等复杂数据方面的杰出表现,已成为一个非常受欢迎的主题。然而,对其成功的理论理解仍然不完全。一个神秘之处是记忆性和概括性之间的矛盾:理论上,该模型经过培训,与有限样本的经验分布完全相同,而在实践中,经过培训的模型可以产生新的样本或估计不可见样本的可能性。同样,分布学习模型的绝大多数多样性要求对这一问题有一个统一的观点。本文提供了一个数学框架,使所有著名的模型都能根据简单的原则产生。为了展示其效力,我们对这些模型的近似错误、培训错误和概括性错误进行了调查,所有这些都可以基于这一框架加以确定。特别是,通过证明这些模型在培训期间享有隐含的正规化,以便早期停止的普及错误避免了对维度的诅咒,解决了上述矛盾。此外,我们提供了一些关于景观分析和模式崩溃现象的新结果。