Correlations between factors of variation are prevalent in real-world data. Exploiting such correlations may increase predictive performance on noisy data; however, often correlations are not robust (e.g., they may change between domains, datasets, or applications) and models that exploit them do not generalize when correlations shift. Disentanglement methods aim to learn representations which capture different factors of variation in latent subspaces. A common approach involves minimizing the mutual information between latent subspaces, such that each encodes a single underlying attribute. However, this fails when attributes are correlated. We solve this problem by enforcing independence between subspaces conditioned on the available attributes, which allows us to remove only dependencies that are not due to the correlation structure present in the training data. We achieve this via an adversarial approach to minimize the conditional mutual information (CMI) between subspaces with respect to categorical variables. We first show theoretically that CMI minimization is a good objective for robust disentanglement on linear problems. We then apply our method on real-world datasets based on MNIST and CelebA, and show that it yields models that are disentangled and robust under correlation shift, including in weakly supervised settings.
翻译:在现实世界的数据中,差异因素之间相互交错的现象普遍存在。 利用这种关联可能会提高噪音数据的预测性; 但是, 往往关系并不牢固( 例如, 它们可能改变域、 数据集或应用程序之间), 并且当相关关系发生变化时, 利用这些关联的模型并不普遍。 分解方法旨在了解反映潜在亚空间差异的不同因素的表达方式。 一种共同的方法是最大限度地减少潜伏子空间之间的相互信息, 使每个子空间都编码成一个单一的内在属性。 但是, 当属性相关时, 这一点就会失败。 我们通过在以可用属性为条件的子空间之间实施独立来解决这个问题, 从而使我们能够消除并非由于培训数据中存在的关联结构而产生的依赖性。 我们通过对抗性方法来达到这一点, 最大限度地减少子空间之间与绝对变量之间的有条件的相互信息( CMI ) 。 我们首先从理论上表明, CMI 最小化是一个在线性问题上强有力解析的良好目标。 我们随后在基于 MNIST 和 CelebA 的薄弱的设置下, 显示, 并显示其在受监督的模型下是扭曲的扭曲性模型。