Can we train a hybrid discriminative-generative model within a single network? This question has recently been answered in the affirmative, introducing the field of Joint Energy-based Model (JEM), which achieves high classification accuracy and image generation quality simultaneously. Despite recent advances, there remain two performance gaps: the accuracy gap to the standard softmax classifier, and the generation quality gap to state-of-the-art generative models. In this paper, we introduce a variety of training techniques to bridge the accuracy gap and the generation quality gap of JEM. 1) We incorporate a recently proposed sharpness-aware minimization (SAM) framework to train JEM, which promotes the energy landscape smoothness and the generalizability of JEM. 2) We exclude data augmentation from the maximum likelihood estimate pipeline of JEM, and mitigate the negative impact of data augmentation to image generation quality. Extensive experiments on multiple datasets demonstrate that our SADA-JEM achieves state-of-the-art performances and outperforms JEM in image classification, image generation, calibration, out-of-distribution detection and adversarial robustness by a notable margin.
翻译:我们能否在一个单一的网络内训练一个混合的歧视性遗传模型?这个问题最近得到肯定的回答,引入了联合能源模型(JEM)领域,该模型同时实现了高分类精确度和图像生成质量。尽管最近取得了一些进展,但仍存在两个绩效差距:标准软分子分类的准确性差距,以及最先进的基因化模型的生成质量差距。在本文中,我们引入了各种培训技术,以弥合正义与平等运动的准确性差距和生成质量差距。 1)我们纳入了最近提出的精锐度最小化框架,以培训正义与平等运动,促进能源景观的平稳性和正义与平等运动的普遍适用性。 2)我们排除了从正义与平等运动最大可能性估计管道中增加数据,并减轻数据增强对形象生成质量的负面影响。关于多套数据集的广泛实验表明,我们的SAD-JEM在图像分类、图像生成、校准、分配外检测和对抗性强度方面达到最先进的业绩,并超越正义与平等运动。