Learning image classification and image generation using the same set of network parameters is a challenging problem. Recent advanced approaches perform well in one task often exhibit poor performance in the other. This work introduces an energy-based classifier and generator, namely EGC, which can achieve superior performance in both tasks using a single neural network. Unlike a conventional classifier that outputs a label given an image (i.e., a conditional distribution $p(y|\mathbf{x})$), the forward pass in EGC is a classifier that outputs a joint distribution $p(\mathbf{x},y)$, enabling an image generator in its backward pass by marginalizing out the label $y$. This is done by estimating the energy and classification probability given a noisy image in the forward pass, while denoising it using the score function estimated in the backward pass. EGC achieves competitive generation results compared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUN Church, while achieving superior classification accuracy and robustness against adversarial attacks on CIFAR-10. This work represents the first successful attempt to simultaneously excel in both tasks using a single set of network parameters. We believe that EGC bridges the gap between discriminative and generative learning.
翻译:学习使用同一套网络参数进行图像分类和图像生成是一个具有挑战性的问题。最近的先进方法在一个任务上表现出色,但在另一个任务上却表现差劲。本研究介绍了一种基于能量的分类器和生成器,即EGC,它可以使用单个神经网络在两个任务中实现优越的性能。与传统的分类器输出给定图像的标签(即条件分布$p(y|\mathbf{x})$)不同,在EGC中,向前通道是一个分类器,它输出联合分布$p(\mathbf{x},y)$,通过在后向通道中消除标签$y$,使其成为图像生成器。这是通过在向前通道中估计给定噪声图像的能量和分类概率来完成的,同时在向后通道中使用估计的得分函数对其进行去噪。EGC在ImageNet-1k、CelebA-HQ和LSUN Church上取得了与最先进方法相当的生成结果,同时在CIFAR-10上实现了优越的分类准确性和对抗性攻击的稳健性。本研究是首次尝试使用单个网络参数集在两个任务中同时取得卓越表现。我们相信EGC缩小了判别式和生成式学习之间的差距。