Learning image classification and image generation using the same set of network parameters is a challenging problem. Recent advanced approaches perform well in one task often exhibit poor performance in the other. This work introduces an energy-based classifier and generator, namely EGC, which can achieve superior performance in both tasks using a single neural network. Unlike a conventional classifier that outputs a label given an image (i.e., a conditional distribution $p(y|\mathbf{x})$), the forward pass in EGC is a classifier that outputs a joint distribution $p(\mathbf{x},y)$, enabling an image generator in its backward pass by marginalizing out the label $y$. This is done by estimating the energy and classification probability given a noisy image in the forward pass, while denoising it using the score function estimated in the backward pass. EGC achieves competitive generation results compared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUN Church, while achieving superior classification accuracy and robustness against adversarial attacks on CIFAR-10. This work represents the first successful attempt to simultaneously excel in both tasks using a single set of network parameters. We believe that EGC bridges the gap between discriminative and generative learning.
翻译:学习使用同一组网络参数进行图像分类和图像生成是一个具有挑战性的问题。最近的高级方法在一个任务中表现良好,但在另一个任务中表现差。本文介绍了一种能量分类器和生成器,即EGC,它使用单个神经网络在两个任务中实现卓越的性能。与传统的分类器输出给定图像的标签(即条件分布$p(y|\mathbf{x})$)不同,EGC中的前向传递是一个分类器,输出联合分布$p(\mathbf{x},y)$,通过在反向传递中消除标签$y$来启用图像生成器。这是通过在前向传递中估计给定噪声图像的能量和分类概率来实现的,在反向传递中使用估计的得分函数进行去噪。EGC在ImageNet-1k、CelebA-HQ和LSUN Church等方面取得了与最先进的方法竞争力的生成结果,同时在CIFAR-10上实现了更高的分类准确性和鲁棒性,能够同时在两个任务中表现出色,这是使用单组网络参数的第一个成功尝试。我们认为EGC填补了判别性学习和生成性学习之间的差距。