We introduce a simple modification to the standard maximum likelihood estimation (MLE) framework. Rather than maximizing a single unconditional likelihood of the data under the model, we maximize a family of \textit{noise conditional} likelihoods consisting of the data perturbed by a continuum of noise levels. We find that models trained this way are more robust to noise, obtain higher test likelihoods, and generate higher quality images. They can also be sampled from via a novel score-based sampling scheme which combats the classical \textit{covariate shift} problem that occurs during sample generation in autoregressive models. Applying this augmentation to autoregressive image models, we obtain 3.32 bits per dimension on the ImageNet 64x64 dataset, and substantially improve the quality of generated samples in terms of the Frechet Inception distance (FID) -- from 37.50 to 12.09 on the CIFAR-10 dataset.
翻译:我们引入了对标准最大可能性估计(MLE)框架的简单修改。 我们不是尽量扩大模型下数据的一个单一的无条件可能性,而是尽量扩大由连续的噪音水平所困扰的数据构成的\ textit{noise mostical}可能性。 我们发现,通过这种方式培训的模型对噪音更加强大,获得更高的测试可能性,并产生更高质量的图像。 它们也可以通过一个创新的基于分数的抽样方案进行抽样, 以对抗在自动递增模型中样本生成过程中出现的古老的\ textit{covoliate traft} 问题。 将这种增强应用到自动递增图像模型,我们在图像网64x64数据集中获取每个维度3.32位,并大幅提高Frechet Incepion距离(FID)中生成的样本的质量 -- -- 从CFARFAR-10数据集的37.50到12.09。