Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to \emph{control} their content becomes an increasingly important problem. We formulate the problem of identifying and erasing a linear subspace that corresponds to a given concept, in order to prevent linear predictors from recovering the concept. We model this problem as a constrained, linear minimax game, and show that existing solutions are generally not optimal for this task. We derive a closed-form solution for certain objectives, and propose a convex relaxation, R-LACE, that works well for others. When evaluated in the context of binary gender removal, the method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method -- despite being linear -- is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.
翻译:在文本数据方面受过培训的现代神经模型依靠未经直接监督的事先培训的表述方法。由于这些表述方法越来越多地用于现实世界的应用中,因此无法控制它们的内容就成为一个越来越重要的问题。我们提出了确定和删除一个与特定概念相对应的线性子空间的问题,以防止线性预测器恢复概念。我们将此问题作为受限制的线性小型最大游戏来模型,并表明现有解决方案通常不最适合这项任务。我们为某些目标找到一种封闭式解决方案,并提出一种对其它目标很有效的螺旋式放松,即R-LACE。在进行二进制性别清除评估时,该方法将恢复一个低维的子空间,通过内在的和外部的评价来减轻其偏向。我们表明,这种方法 -- 尽管是线性,但却是线性 -- 是非常清晰的,有效地减轻了深非线性分类方法中的偏向性,同时保持可分性和可解释性。