We study the problem of training a model that must obey demographic fairness conditions when the sensitive features are not available at training time -- in other words, how can we train a model to be fair by race when we don't have data about race? We adopt a fairness pipeline perspective, in which an "upstream" learner that does have access to the sensitive features will learn a proxy model for these features from the other attributes. The goal of the proxy is to allow a general "downstream" learner -- with minimal assumptions on their prediction task -- to be able to use the proxy to train a model that is fair with respect to the true sensitive features. We show that obeying multiaccuracy constraints with respect to the downstream model class suffices for this purpose, and provide sample- and oracle efficient-algorithms and generalization bounds for learning such proxies. In general, multiaccuracy can be much easier to satisfy than classification accuracy, and can be satisfied even when the sensitive features are hard to predict.
翻译:我们研究的是,在培训时没有敏感特征时,培训一个必须遵守人口公平条件的模型的问题,换句话说,当培训时,我们如何培训一个模型,以便在没有种族数据时通过种族公平进行种族公平?我们采用了公平管道观点,在这种观点中,一个“上流”学习者如果能够使用敏感特征,将从其他属性中学习这些特征的代用模型。代理的目的是允许一个通用的“下流”学习者 -- -- 对其预测任务只有最低的假设 -- -- 能够使用代理来培训一个在真正敏感特征方面是公平的模型。我们表明,在下游模型类别方面遵守多精度限制,就足以满足这一目的,并提供样本和(或)手法高效的等级和通用界限,用于学习此类准轴。一般来说,多精度比分类准确性更容易满足,即使敏感特征难以预测,也可以满足。