A supervised machine learning algorithm determines a model from a learning sample that will be used to predict new observations. To this end, it aggregates individual characteristics of the observations of the learning sample. But this information aggregation does not consider any potential selection on unobservables and any status-quo biases which may be contained in the training sample. The latter bias has raised concerns around the so-called \textit{fairness} of machine learning algorithms, especially towards disadvantaged groups. In this chapter, we review the issue of fairness in machine learning through the lenses of structural econometrics models in which the unknown index is the solution of a functional equation and issues of endogeneity are explicitly accounted for. We model fairness as a linear operator whose null space contains the set of strictly {\it fair} indexes. A {\it fair} solution is obtained by projecting the unconstrained index into the null space of this operator or by directly finding the closest solution of the functional equation into this null space. We also acknowledge that policymakers may incur a cost when moving away from the status quo. Achieving \textit{approximate fairness} is obtained by introducing a fairness penalty in the learning procedure and balancing more or less heavily the influence between the status quo and a full fair solution.
翻译:受监督的机器学习算法从学习样本中确定一个模型,用于预测新的观测。为此,它汇总了学习样本观测的个别特征。但这一信息汇总并不考虑培训样本中可能包含的关于不可观察和地位配额偏差的任何潜在选择。后一种偏差引起了人们对机器学习算法所谓“textit{fairity”的担忧,特别是针对处境不利群体。在本章中,我们通过结构计量经济学模型的透镜审查机器学习的公平问题,其中未知指数是功能等式的解决方案,而内分性问题则明确得到考虑。我们把公平作为线性操作者,其空格包含一套严格的 ~公平 指数。通过将不受约束的指数投射到该操作者的空格,或者直接找到最接近功能方位的解决方案进入这一空格空间。我们还承认,决策者在摆脱现状时可能会付出成本。 实现“ 接近公平性 ” 和“ 公平性 ”, 是通过在学习过程中引入一种高度的公平性和较不完全的影响, 获得一种公平性惩罚。