Text representations learned by machine learning models often encode undesirable demographic information of the user. Predictive models based on these representations can rely on such information resulting in biased decisions. We present a novel debiasing technique Fairness-aware Rate Maximization (FaRM), that removes demographic information by making representations of instances belonging to the same protected attribute class uncorrelated using the rate-distortion function. FaRM is able to debias representations with or without a target task at hand. FaRM can also be adapted to simultaneously remove information about multiple protected attributes. Empirical evaluations show that FaRM achieves state-of-the-art performance on several datasets, and learned representations leak significantly less protected attribute information against an attack by a non-linear probing network.
翻译:以这些表述为基础的预测模型可以依赖这类信息,从而导致产生偏颇的决定。我们展示了一种新的贬低技术“公平了解率最大化”(FARM),通过使用比例扭曲功能,对属于同一受保护属性类别、与比例扭曲无关的事例进行表述,从而消除人口信息。FARM可以对当前目标任务进行贬低,也可以同时删除关于多个受保护属性的信息。经验性评估显示,FARM在多个数据集上取得了最先进的性能,而所学的表述对非线性标尺网络的攻击所保护的属性信息则少得多。