Data-driven AI systems can lead to discrimination on the basis of protected attributes like gender or race. One reason for this behavior is the encoded societal biases in the training data (e.g., females are underrepresented), which is aggravated in the presence of unbalanced class distributions (e.g., "granted" is the minority class). State-of-the-art fairness-aware machine learning approaches focus on preserving the \emph{overall} classification accuracy while improving fairness. In the presence of class-imbalance, such methods may further aggravate the problem of discrimination by denying an already underrepresented group (e.g., \textit{females}) the fundamental rights of equal social privileges (e.g., equal credit opportunity). To this end, we propose AdaFair, a fairness-aware boosting ensemble that changes the data distribution at each round, taking into account not only the class errors but also the fairness-related performance of the model defined cumulatively based on the partial ensemble. Except for the in-training boosting of the group discriminated over each round, AdaFair directly tackles imbalance during the post-training phase by optimizing the number of ensemble learners for balanced error performance (BER). AdaFair can facilitate different parity-based fairness notions and mitigate effectively discriminatory outcomes. Our experiments show that our approach can achieve parity in terms of statistical parity, equal opportunity, and disparate mistreatment while maintaining good predictive performance for all classes.
翻译:由数据驱动的AI系统可能导致基于性别或种族等受保护属性的歧视,这种行为的一个原因是培训数据中已存在的社会偏见(例如,女性代表性不足),在存在不平衡的阶级分布(例如,“赠与”是少数阶级)的情况下,这种偏见更加严重。州一级公平意识的机器学习方法侧重于维护等级的准确性,同时提高公平性。在存在阶级平衡的情况下,这种方法可能进一步加重歧视问题,拒绝一个已经代表性不足的群体(例如,Textit{女性})享有平等社会特权的基本权利(例如,平等信贷机会)。为此,我们建议AdaFair,一个公平性增强的元素,改变每个回合的数据分配,不仅考虑到阶级错误,而且考虑到基于部分组合的累积性模式的公平性表现。除了在每一回合中提高群体歧视程度之外,AdaFAFair, 公平性能促进我们所有公平性业绩的升级,同时通过优化机会培训阶段的公平性学习成绩,从而直接解决我们各种平等性业绩的不平衡性。