Recently, numerous studies have demonstrated the presence of bias in machine learning powered decision-making systems. Although most definitions of algorithmic bias have solid mathematical foundations, the corresponding bias detection techniques often lack statistical rigor, especially for non-iid data. We fill this gap in the literature by presenting a rigorous non-parametric testing procedure for bias according to Predictive Rate Parity, a commonly considered notion of algorithmic bias. We adapt traditional asymptotic results for non-parametric estimators to test for bias in the presence of dependence commonly seen in user-level data generated by technology industry applications and illustrate how these approaches can be leveraged for mitigation. We further propose modifications of this methodology to address bias measured through marginal outcome disparities in classification settings and extend notions of predictive rate parity to multi-objective models. Experimental results on real data show the efficacy of the proposed detection and mitigation methods.
翻译:最近,许多研究表明,机器学习有动力的决策系统中存在偏见。虽然大多数算法偏差的定义具有坚实的数学基础,但相应的偏差检测技术往往缺乏统计上的严格性,特别是非二元数据。我们填补文献中的这一空白,根据常被考虑的算法偏差概念,即预测率均等,对偏差进行严格的非参数测试程序。我们调整了传统的非参数偏差测量结果,以测试在技术行业应用产生的用户一级数据中常见的依赖性情况下的偏差,并说明了如何利用这些方法来缓解这些偏差。我们进一步提议修改这一方法,以解决分类环境中的边际结果差异所衡量的偏差,并将预测率对等概念扩大到多目标模型。关于实际数据的实验结果显示拟议的探测和减缓方法的有效性。