Addressing fairness concerns about machine learning models is a crucial step towards their long-term adoption in real-world automated systems. While many approaches have been developed for training fair models from data, little is known about the effects of data corruption on these methods. In this work we consider fairness-aware learning under arbitrary data manipulations. We show that an adversary can force any learner to return a biased classifier, with or without degrading accuracy, and that the strength of this bias increases for learning problems with underrepresented protected groups in the data. We also provide upper bounds that match these hardness results up to constant factors, by proving that two natural learning algorithms achieve order-optimal guarantees in terms of both accuracy and fairness under adversarial data manipulations.
翻译:解决对机器学习模式的公平问题,是向在现实世界自动化系统中长期采用这种模式迈出的关键一步。虽然已经为从数据中培训公平模型制定了许多方法,但对数据腐败对这些方法的影响知之甚少。在这项工作中,我们考虑在任意操纵数据的情况下进行公平意识学习。我们表明,对手可以强迫任何学习者返回有偏向的分类者,无论是否准确,而且这种偏向的强度会增加数据中代表不足的受保护群体的学习问题。我们还提供了与这些困难结果相匹配的上限,以证明两种自然学习算法在对立数据操纵下在准确性和公平性方面都实现了有条不紊的最佳保证。