Representation learning algorithms offer the opportunity to learn invariant representations of the input data with regard to nuisance factors. Many authors have leveraged such strategies to learn fair representations, i.e., vectors where information about sensitive attributes is removed. These methods are attractive as they may be interpreted as minimizing the mutual information between a neural layer's activations and a sensitive attribute. However, the theoretical grounding of such methods relies either on the computation of infinitely accurate adversaries or on minimizing a variational upper bound of a mutual information estimate. In this paper, we propose a methodology for direct computation of the mutual information between a neural layer and a sensitive attribute. We employ stochastically-activated binary neural networks, which lets us treat neurons as random variables. We are then able to compute (not bound) the mutual information between a layer and a sensitive attribute and use this information as a regularization factor during gradient descent. We show that this method compares favorably with the state of the art in fair representation learning and that the learned representations display a higher level of invariance compared to full-precision neural networks.
翻译:代表学习算法为学习输入数据在骚扰因素方面的变式表达提供了机会。 许多作者利用这些策略来学习公平表达, 即删除敏感属性信息的矢量。 这些方法具有吸引力, 因为它们可能被解释为将神经层激活和敏感属性之间的相互信息最小化。 但是, 这种方法的理论依据依赖于无限准确的对手的计算, 或最大限度地减少相互信息估计的变异上限。 在本文中, 我们提出了一个直接计算神经层和敏感属性之间相互信息的方法 。 我们采用了随机激活的双向神经网络, 使我们可以将神经元作为随机变量对待 。 然后, 我们就可以对层和敏感属性之间的相互信息进行( 不受约束), 并将这些信息作为梯度下降期间的正规化因素 。 我们表明, 这种方法优于公平代表性学习的艺术状态, 并且所学的表达方式显示比全面精确神经网络的逆差程度更高 。