We introduce a boosting algorithm to pre-process data for fairness. Starting from an initial fair but inaccurate distribution, our approach shifts towards better data fitting while still ensuring a minimal fairness guarantee. To do so, it learns the sufficient statistics of an exponential family with boosting-compliant convergence. Importantly, we are able to theoretically prove that the learned distribution will have a representation rate and statistical rate data fairness guarantee. Unlike recent optimization based pre-processing methods, our approach can be easily adapted for continuous domain features. Furthermore, when the weak learners are specified to be decision trees, the sufficient statistics of the learned distribution can be examined to provide clues on sources of (un)fairness. Empirical results are present to display the quality of result on real-world data.
翻译:我们从最初的公平但不准确的分布开始,我们的方法转向更好的数据配置,同时仍然确保最低限度的公平保障。为了这样做,我们学会了指数式家庭的充足统计数据,并符合一致要求。重要的是,我们可以从理论上证明,所学的分布将具有代表率和统计数据率公平性保障。与最近的基于优化的预处理方法不同,我们的方法可以很容易地适应连续的域特性。此外,当弱学习者被指定为决策树时,可以对所学的分布的充足统计数据进行检查,以提供(不公平)公平来源的线索。有经验的结果表明真实世界数据的结果质量。