We study the problem of training a model that must obey demographic fairness conditions when the sensitive features are not available at training time -- in other words, how can we train a model to be fair by race when we don't have data about race? We adopt a fairness pipeline perspective, in which an "upstream" learner that does have access to the sensitive features will learn a proxy model for these features from the other attributes. The goal of the proxy is to allow a general "downstream" learner -- with minimal assumptions on their prediction task -- to be able to use the proxy to train a model that is fair with respect to the true sensitive features. We show that obeying multiaccuracy constraints with respect to the downstream model class suffices for this purpose, provide sample- and oracle efficient-algorithms and generalization bounds for learning such proxies, and conduct an experimental evaluation. In general, multiaccuracy is much easier to satisfy than classification accuracy, and can be satisfied even when the sensitive features are hard to predict.
翻译:我们研究的是,在培训时间没有敏感特征时,培训一个必须遵守人口公平条件的模型的问题 -- -- 换句话说,当我们没有种族数据时,我们如何培训一个模型,以种族公平对待?我们采用了公平管道观点,在这种观点中,一个“上游”学习者,如果确实能够接触敏感特征,将从其他属性中学习这些特征的代用模型。代理的目的是允许一个普通的“下游”学习者 -- -- 其预测任务的假设是最小的 -- -- 能够使用代理来培训一个在真正敏感特征方面是公平的模型。我们表明,在下游模型类别方面遵守多精度限制,就足以满足这一目的,提供样本和或触摸效率等高的等级和通用界限,用于学习这种准特征,并进行实验性评估。一般来说,多精度比分类准确性容易满足,而且即使在敏感特征难以预测的情况下,也能够满足。