公平分类与反对相扰扰动 (Fair Classification with Adversarial Perturbations)

We study fair classification in the presence of an omniscient adversary that, given an $\eta$, is allowed to choose an arbitrary $\eta$-fraction of the training samples and arbitrarily perturb their protected attributes. The motivation comes from settings in which protected attributes can be incorrect due to strategic misreporting, malicious actors, or errors in imputation; and prior approaches that make stochastic or independence assumptions on errors may not satisfy their guarantees in this adversarial setting. Our main contribution is an optimization framework to learn fair classifiers in this adversarial setting that comes with provable guarantees on accuracy and fairness. Our framework works with multiple and non-binary protected attributes, is designed for the large class of linear-fractional fairness metrics, and can also handle perturbations besides protected attributes. We prove near-tightness of our framework's guarantees for natural hypothesis classes: no algorithm can have significantly better accuracy and any algorithm with better fairness must have lower accuracy. Empirically, we evaluate the classifiers produced by our framework for statistical rate on real-world and synthetic datasets for a family of adversaries.

翻译：我们研究的是,在一个无所不知的对手面前进行公平的分类,这个对手以美元为单位,可以任意选择以美元为单位对培训样品进行任意的折射,任意干扰其受保护的属性;其动机来自保护属性可能因战略误报、恶意行为者或估算错误而不正确的环境;而事先对错误作出随机或独立假设的做法可能无法满足其在这个敌对环境中的保证;我们的主要贡献是建立一个优化框架,以学习在这个对抗环境中的公平分类者,这种框架的准确性和公平性得到可证实的保证;我们的框架涉及多种和非二元的保护属性,是为大量线性违法公平度指标设计的,还可以处理除受保护属性外的扰动性。我们证明,我们框架对自然假设类的保证几乎是谨慎的:任何算法都不可能有更好的准确性,任何更公平的算法都必须更低的准确性。我们很生动地评估我们关于真实世界统计率的框架和为敌对者家庭合成数据集产生的分类者。