Text representations learned by machine learning models often encode undesirable demographic information of the user. Predictive models based on these representations can rely on such information, resulting in biased decisions. We present a novel debiasing technique, Fairness-aware Rate Maximization (FaRM), that removes protected information by making representations of instances belonging to the same protected attribute class uncorrelated, using the rate-distortion function. FaRM is able to debias representations with or without a target task at hand. FaRM can also be adapted to remove information about multiple protected attributes simultaneously. Empirical evaluations show that FaRM achieves state-of-the-art performance on several datasets, and learned representations leak significantly less protected attribute information against an attack by a non-linear probing network.
翻译:机床学习模型所学的文字表述往往将用户不受欢迎的人口信息编码成文。基于这些表述的预测模型可以依赖这种信息,从而产生偏颇的决定。我们提出了一种新的贬低技术,即公平自觉率最大化(FARM),通过使用率扭曲功能,对属于同一受保护属性类别、与比例不相干的情况进行表述,从而去除受保护的信息。法理机制可以与现有目标任务或无目标任务进行贬低。法理机制也可以调整,同时删除关于多个受保护属性的信息。经验性评估表明,法理机制在若干数据集上取得了最先进的表现,而所学的表述对非线性探测网络攻击的属性信息的保护性大大降低。