Recent research on fair regression focused on developing new fairness notions and approximation methods as target variables and even the sensitive attribute are continuous in the regression setting. However, all previous fair regression research assumed the training data and testing data are drawn from the same distributions. This assumption is often violated in real world due to the sample selection bias between the training and testing data. In this paper, we develop a framework for fair regression under sample selection bias when dependent variable values of a set of samples from the training data are missing as a result of another hidden process. Our framework adopts the classic Heckman model for bias correction and the Lagrange duality to achieve fairness in regression based on a variety of fairness notions. Heckman model describes the sample selection process and uses a derived variable called the Inverse Mills Ratio (IMR) to correct sample selection bias. We use fairness inequality and equality constraints to describe a variety of fairness notions and apply the Lagrange duality theory to transform the primal problem into the dual convex optimization. For the two popular fairness notions, mean difference and mean squared error difference, we derive explicit formulas without iterative optimization, and for Pearson correlation, we derive its conditions of achieving strong duality. We conduct experiments on three real-world datasets and the experimental results demonstrate the approach's effectiveness in terms of both utility and fairness metrics.
翻译:最近关于公平回归的研究侧重于发展新的公平概念和近似方法,将其作为目标变量,甚至敏感属性,这些研究在回归环境中是连续不断的。然而,所有以前的公平回归研究都假定培训数据和测试数据来自相同的分布。由于培训和测试数据之间的抽样选择偏差,这一假设在现实世界中经常受到侵犯。在本文中,当培训数据中一组样本的可变值因另一个隐藏过程而缺失,将一组样本的可变值从抽样选择偏差转化为双重二次曲线优化时,我们开发了一个框架,以便在样本选择偏差修正和拉格曼双轨制模式方面采用经典的黑克曼模式,以便在各种公平概念的基础上实现回归的公平。海克曼模式描述抽样选择过程,并使用一个衍生变量,即Inversal Mills 比率(IMR),以纠正抽样选择偏差。我们使用公平不平等和平等制约来描述各种公平概念,并运用拉格兰格特双轨双轨理论,将原始问题转化为双重的 convex优化。对于两种流行的公平概念, 意味着差异和平均的错差,我们得出明确的公式,而没有相互交错,我们则在Pearsonson-bilitybilitybilitybilence 和实验中,我们用两种条件来进行双重的双重的实验性试验。