A prominent family of methods for learning data distributions relies on density ratio estimation (DRE), where a model is trained to $\textit{classify}$ between data samples and samples from some reference distribution. These techniques are successful in simple low-dimensional settings but fail to achieve good results on complex high-dimensional data, like images. A different family of methods for learning distributions is that of denoising diffusion models (DDMs), in which a model is trained to $\textit{denoise}$ data samples. These approaches achieve state-of-the-art results in image, video, and audio generation. In this work, we present $\textit{Classification Diffusion Models}$ (CDMs), a generative technique that adopts the denoising-based formalism of DDMs while making use of a classifier that predicts the amount of noise added to a clean signal, similarly to DRE methods. Our approach is based on the observation that an MSE-optimal denoiser for white Gaussian noise can be expressed in terms of the gradient of a cross-entropy-optimal classifier for predicting the noise level. As we illustrate, CDM achieves better denoising results compared to DDM, and leads to at least comparable FID in image generation. CDM is also capable of highly efficient one-step exact likelihood estimation, achieving state-of-the-art results among methods that use a single step. Code is available on the project's webpage in https://shaharYadin.github.io/CDM/ .
翻译:暂无翻译