Learning image classification and image generation using the same set of network parameters is a challenging problem. Recent advanced approaches perform well in one task often exhibit poor performance in the other. This work introduces an energy-based classifier and generator, namely EGC, which can achieve superior performance in both tasks using a single neural network. Unlike a conventional classifier that outputs a label given an image (i.e., a conditional distribution $p(y|\mathbf{x})$), the forward pass in EGC is a classifier that outputs a joint distribution $p(\mathbf{x},y)$, enabling an image generator in its backward pass by marginalizing out the label $y$. This is done by estimating the energy and classification probability given a noisy image in the forward pass, while denoising it using the score function estimated in the backward pass. EGC achieves competitive generation results compared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUN Church, while achieving superior classification accuracy and robustness against adversarial attacks on CIFAR-10. This work represents the first successful attempt to simultaneously excel in both tasks using a single set of network parameters. We believe that EGC bridges the gap between discriminative and generative learning.
翻译:学习使用相同的网络参数进行图像分类和图像生成是一个具有挑战性的问题。最近的先进方法在一项任务中表现良好,但在另一项任务中表现不佳。本文介绍了一种基于能量的分类器和生成器,即 EGC,它可以使用单个神经网络在两项任务中实现优异的性能。与传统的分类器不同,传统分类器输出给定图像的标签(即条件分布 $p(y|\mathbf{x})$),EGC 中正向传递是分类器,输出联合分布 $p(\mathbf{x},y)$,通过在反向传递中消除标签 $y$ 来启用图像生成器。通过在正向传递中估计有噪声图像的能量和分类概率,同时使用反向传递中估计的得分函数对其进行降噪处理。EGC 在 ImageNet-1k、CelebA-HQ 和 LSUN Church 上达到了与最先进方法相媲美的生成结果,同时在 CIFAR-10 上具有更高的分类准确性和对抗攻击的鲁棒性。这项工作是首次尝试使用单个网络参数集同时在两个任务中表现出色的成功尝试。我们相信 EGC 弥合了判别性和生成性学习之间的差距。