Synthetic data generation has become an emerging tool to help improve the adversarial robustness in classification tasks since robust learning requires a significantly larger amount of training samples compared with standard classification tasks. Among various deep generative models, the diffusion model has been shown to produce high-quality synthetic images and has achieved good performance in improving the adversarial robustness. However, diffusion-type methods are typically slow in data generation as compared with other generative models. Although different acceleration techniques have been proposed recently, it is also of great importance to study how to improve the sample efficiency of generated data for the downstream task. In this paper, we first analyze the optimality condition of synthetic distribution for achieving non-trivial robust accuracy. We show that enhancing the distinguishability among the generated data is critical for improving adversarial robustness. Thus, we propose the Contrastive-Guided Diffusion Process (Contrastive-DP), which adopts the contrastive loss to guide the diffusion model in data generation. We verify our theoretical results using simulations and demonstrate the good performance of Contrastive-DP on image datasets.
翻译:合成数据生成已成为帮助提高分类任务对抗性强力的新工具,因为强有力的学习需要比标准分类任务要多得多的培训样本,在各种深层的基因模型中,扩散模型已证明能够产生高质量的合成图像,在改进对抗性强力方面表现良好,然而,与其他基因化模型相比,在数据生成方面,扩散型方法通常比较缓慢。虽然最近提出了不同的加速技术,但研究如何提高下游任务生成数据的抽样效率也非常重要。在本文件中,我们首先分析合成分布的最佳性条件,以便实现非三角稳健的准确性。我们表明,提高生成数据的可辨别性对于提高对抗性强力至关重要。因此,我们建议采用对比性辅助集成程序(Contratstive-DP),该程序采用对比性损失来指导数据生成中的传播模式。我们用模拟来验证我们的理论结果,并展示对比式DP在图像数据集方面的良好性能。