Self-supervised learning is a form of unsupervised learning that leverages rich information in data to learn representations. However, data sometimes contains certain information that may be undesirable for downstream tasks. For instance, gender information may lead to biased decisions on many gender-irrelevant tasks. In this paper, we develop conditional contrastive learning to remove undesirable information in self-supervised representations. To remove the effect of the undesirable variable, our proposed approach conditions on the undesirable variable (i.e., by fixing the variations of it) during the contrastive learning process. In particular, inspired by the contrastive objective InfoNCE, we introduce Conditional InfoNCE (C-InfoNCE), and its computationally efficient variant, Weak-Conditional InfoNCE (WeaC-InfoNCE), for conditional contrastive learning. We demonstrate empirically that our methods can successfully learn self-supervised representations for downstream tasks while removing a great level of information related to the undesirable variables. We study three scenarios, each with a different type of undesirable variables: task-irrelevant meta-information for self-supervised speech representation learning, sensitive attributes for fair representation learning, and domain specification for multi-domain visual representation learning.
翻译:自我监督的学习是一种不受监督的学习形式,它利用数据中的丰富信息来学习演示。然而,数据有时含有某些可能对下游任务不可取的信息。例如,性别信息可能导致对许多与性别有关的任务作出有偏见的决定。在本文件中,我们开发有条件的对比学习,以消除自我监督的演示中的不良信息。为了消除不受欢迎的变量的影响,在对比式学习过程中,我们提出的方法条件对不良变量(即,通过固定其变异性)的影响。特别是,受对比性目标InfONCE(C-InfONCE)的启发,我们引入了某些对下游任务来说可能是不可取的变量(即,通过修正其变异性)。我们引入了三种不同类型的不良变量:与任务相关的元信息,用于自我监督的语音演示,用于公平学习的敏感属性。