We investigate the problem of algorithmic fairness in the case where sensitive and non-sensitive features are available and one aims to generate new, `oblivious', features that closely approximate the non-sensitive features, and are only minimally dependent on the sensitive ones. We study this question in the context of kernel methods. We analyze a relaxed version of the Maximum Mean Discrepancy criterion which does not guarantee full independence but makes the optimization problem tractable. We derive a closed-form solution for this relaxed optimization problem and complement the result with a study of the dependencies between the newly generated features and the sensitive ones. Our key ingredient for generating such oblivious features is a Hilbert-space-valued conditional expectation, which needs to be estimated from data. We propose a plug-in approach and demonstrate how the estimation errors can be controlled. While our techniques help reduce the bias, we would like to point out that no post-processing of any dataset could possibly serve as an alternative to well-designed experiments.
翻译:在有敏感和非敏感特征的情况下,我们调查算法公平问题,在这种情况下,我们调查算法公平问题,因为有敏感和非敏感特征存在,目的是产生新的、“明显”特征,这些特征与非敏感特征非常接近,而且只基本依赖于敏感特征。我们从内核方法的角度研究这一问题。我们分析最大平均值差异标准的一个宽松版本,它不能保证完全独立,但使优化问题易于处理。我们为这种宽松优化问题找到一种封闭式解决方案,并研究新生成的特征与敏感特征之间的依赖性,以此补充结果。我们产生这类隐蔽特征的关键要素是需要从数据中估算的、具有希尔伯特空间价值的有条件期望。我们建议采用插座方法,并展示如何控制估算错误。虽然我们的技术有助于减少偏差,但我们想指出,任何数据集的后处理都不可能作为设计良好的实验的替代方法。