Adversarial training (AT) methods are effective against adversarial attacks, yet they introduce severe disparity of accuracy and robustness between different classes, known as the robust fairness problem. Previously proposed Fair Robust Learning (FRL) adaptively reweights different classes to improve fairness. However, the performance of the better-performed classes decreases, leading to a strong performance drop. In this paper, we observed two unfair phenomena during adversarial training: different difficulties in generating adversarial examples from each class (source-class fairness) and disparate target class tendencies when generating adversarial examples (target-class fairness). From the observations, we propose Balance Adversarial Training (BAT) to address the robust fairness problem. Regarding source-class fairness, we adjust the attack strength and difficulties of each class to generate samples near the decision boundary for easier and fairer model learning; considering target-class fairness, by introducing a uniform distribution constraint, we encourage the adversarial example generation process for each class with a fair tendency. Extensive experiments conducted on multiple datasets (CIFAR-10, CIFAR-100, and ImageNette) demonstrate that our method can significantly outperform other baselines in mitigating the robust fairness problem (+5-10\% on the worst class accuracy)
翻译:对抗性培训(AT)方法在对抗性攻击方面是有效的,但是它们在不同类别之间引入了严重的准确性和稳健性差异,称为稳健的公平问题。以前曾提议过Fair robust Learning(FRL)适应性地重新评分不同等级,但表现较好的班级的表现下降,导致业绩显著下降。在本文件中,我们观察到了对抗性培训中两种不公平的现象:在产生对抗性例子时,不同班级之间产生对抗性例子(源级公平)和不同的目标级趋势(目标级公平)。我们从观察中建议平衡性反向培训(BAT)以解决稳健的公平问题。关于源级公平问题,我们调整每个班级的攻击力和困难,在接近决定界限的地方产生样品,以便更容易和更公平的示范性学习;考虑到目标级公平性,我们采用统一的分配限制,鼓励每个班级采用对抗性例子生成过程,并保持公平趋势。在多个数据集(CIFAR-10、CIFAR-100和图像Nette)上进行的广泛试验,表明我们的方法在减少稳性问题方面比其他基线差得多。