从观察数据中公平、非政策外学习</s> (Fair Off-Policy Learning from Observational Data)

Businesses and organizations must ensure that their algorithmic decision-making is fair in order to meet legislative, ethical, and societal demands. For example, decision-making in automated hiring must not discriminate with respect to gender or race. To achieve this, prior research has contributed approaches that ensure algorithmic fairness in machine learning predictions, while comparatively little effort has focused on algorithmic fairness in decision models, specifically off-policy learning. In this paper, we propose a novel framework for fair off-policy learning: we learn decision rules from observational data under different notions of fairness, where we explicitly assume that observational data were collected under a different -- potentially biased -- behavioral policy. For this, we first formalize different fairness notions for off-policy learning. We then propose a machine learning approach to learn optimal policies under these fairness notions. Specifically, we reformulate the fairness notions into unconstrained learning objectives that can be estimated from finite samples. Here, we leverage machine learning to minimize the objective constrained on a fair representation of the data, so that the resulting policies satisfy our fairness notions. We further provide theoretical guarantees in form of generalization bounds for the finite-sample version of our framework. We demonstrate the effectiveness of our framework through extensive numerical experiments using both simulated and real-world data. As a result, our work enables algorithmic decision-making in a wide array of practical applications where fairness must ensured.

翻译：企业和组织必须确保其算法决策是公平的,以满足立法、伦理和社会需求。例如,自动化雇用的决策不得在性别或种族方面有所区别。为此,先前的研究有助于确保机器学习预测的算法公平,而相对较少的努力侧重于决策模式中的算法公平,特别是政策外学习。在本文件中,我们提出了一个公平政策学习的新框架:我们从不同公平概念下的观察数据中学习决策规则,我们明确假设观测数据是在不同的 -- -- 可能偏向 -- -- 行为政策下收集的。为此,我们首先将不同的非政策学习公平概念正式化。我们然后提出一种机器学习方法,在这些公平概念下学习最佳政策。具体地说,我们把公平概念重新纳入从有限抽样中可以估计的未经集思广益的学习目标。在这里,我们利用机器学习来尽量减少数据公平代表性上的目标,以便由此产生的政策能够满足我们的公平概念。我们进一步提供理论保障,以一般化的形式,为有限的、可能偏向外学习的理论学习确定不同的公平概念。我们用机器学习方法学习这些概念来学习最佳政策。具体地,我们用一个模型来展示我们框架的模型,以便模拟地应用我们做成一个真正的数据。我们的工作能够模拟地试验。</s>