Decision making algorithms, in practice, are often trained on data that exhibits a variety of biases. Decision-makers often aim to take decisions based on some ground-truth target that is assumed or expected to be unbiased, i.e., equally distributed across socially salient groups. In many practical settings, the ground-truth cannot be directly observed, and instead, we have to rely on a biased proxy measure of the ground-truth, i.e., biased labels, in the data. In addition, data is often selectively labeled, i.e., even the biased labels are only observed for a small fraction of the data that received a positive decision. To overcome label and selection biases, recent work proposes to learn stochastic, exploring decision policies via i) online training of new policies at each time-step and ii) enforcing fairness as a constraint on performance. However, the existing approach uses only labeled data, disregarding a large amount of unlabeled data, and thereby suffers from high instability and variance in the learned decision policies at different times. In this paper, we propose a novel method based on a variational autoencoder for practical fair decision-making. Our method learns an unbiased data representation leveraging both labeled and unlabeled data and uses the representations to learn a policy in an online process. Using synthetic data, we empirically validate that our method converges to the optimal (fair) policy according to the ground-truth with low variance. In real-world experiments, we further show that our training approach not only offers a more stable learning process but also yields policies with higher fairness as well as utility than previous approaches.
翻译:在实践中,决策算法往往在显示各种偏差的数据方面受过培训; 决策者往往以基于某些假定或预期不带偏见的地面真实目标作出决定,即在不同社会显要群体中平等分布; 在许多实际环境中,地面真实性无法直接观察,相反,我们不得不依赖有偏见的地面真实性代用度,即数据中存在偏差标签; 此外,数据往往有选择性地标出各种偏差的公平性; 数据往往有选择性地标出,即,即使是有偏差的标签也只为获得积极决定的一小部分数据所观察到的。 为了克服标签和选择偏差,最近的工作提议通过i) 在线培训新政策,在每次步骤上都无法直接观察,而要将公平性作为业绩的制约。 然而,现行方法仅使用标签数据,无视大量无标签的数据,因此在不同时期,在所了解的决策政策中,只有高度的不稳定性和差异性标签。 在本文中,我们建议采用一种创新的方法,即以不偏差的在线数据代表方式,用一种不透明性的方法,我们用一种实际的标签方法学习一种不透明性的方法,用一种不透明性的方法,我们的实际数据方法来学习一种不透明性的数据。