In a recent paper, Celis et al. (2020) introduced a new approach to fairness that corrects the data distribution itself. The approach is computationally appealing, but its approximation guarantees with respect to the target distribution can be quite loose as they need to rely on a (typically limited) number of constraints on data-based aggregated statistics; also resulting on a fairness guarantee which can be data dependent. Our paper makes use of a mathematical object recently introduced in privacy -- mollifiers of distributions -- and a popular approach to machine learning -- boosting -- to get an approach in the same lineage as Celis et al. but without those impediments, including in particular, better guarantees in terms of accuracy and finer guarantees in terms of fairness. The approach involves learning the sufficient statistics of an exponential family. When training data is tabular, it is defined by decision trees whose interpretability can provide clues on the source of (un)fairness. Experiments display the quality of the results obtained for simulated and real-world data.
翻译:在最近的一篇论文中,Celis等人(2020年)引入了一种纠正数据分布本身的公平性新办法。这种方法在计算上具有吸引力,但是其目标分布的近似保障可能相当松散,因为它们需要依赖基于数据的综合统计数据方面的数量(通常有限)限制;还导致一种可以依赖数据的公平保障。我们的文件利用了最近在隐私中引入的数学对象 -- -- 分发的放大器 -- -- 和对机器学习的流行方法 -- -- 推动 -- -- 以便在与Celis等人的线条上找到一种方法,但是没有这些障碍,特别包括更准确性和更精确的公平性保障。这种方法涉及学习指数式家庭的充分统计数据。当培训数据以表格形式列出时,它由决策树来界定,决策树的可解释性可以提供(不公平)来源的线索。实验显示了模拟数据和真实世界数据所获结果的质量。