Training fair machine learning models, aiming for their interpretability and solving the problem of domain shift has gained a lot of interest in the last years. There is a vast amount of work addressing these topics, mostly in separation. In this work we show that they can be seen as a common framework of learning invariant representations. The representations should allow to predict the target while at the same time being invariant to sensitive attributes which split the dataset into subgroups. Our approach is based on the simple observation that it is impossible for any learning algorithm to differentiate samples if they have the same feature representation. This is formulated as an additional loss (regularizer) enforcing a common feature representation across subgroups. We apply it to learn fair models and interpret the influence of the sensitive attribute. Furthermore it can be used for domain adaptation, transferring knowledge and learning effectively from very few examples. In all applications it is essential not only to learn to predict the target, but also to learn what to ignore.
翻译:培训公平的机器学习模式,旨在解释这些模式,并解决域转移问题,在过去几年中引起了人们的极大兴趣。处理这些专题的大量工作,大多是分开的。在这项工作中,我们表明这些主题可以被视为学习变数代表的共同框架。这些表述应允许预测目标,同时对将数据集分成分组的敏感属性无动于衷。我们的方法基于简单的观察,即如果样本具有相同的特征代表,任何学习算法都不可能区分样本。这是作为在各分组之间实施共同特征代表的额外损失(常规化器)而拟订的。我们运用它来学习公平模型和解释敏感属性的影响。此外,它可用于领域适应、转让知识和从极少数实例中有效学习。在所有应用中,不仅必须学会预测目标,还必须学会忽略什么。