Correlations between factors of variation are prevalent in real-world data. Machine learning algorithms may benefit from exploiting such correlations, as they can increase predictive performance on noisy data. However, often such correlations are not robust (e.g., they may change between domains, datasets, or applications) and we wish to avoid exploiting them. Disentanglement methods aim to learn representations which capture different factors of variation in latent subspaces. A common approach involves minimizing the mutual information between latent subspaces, such that each encodes a single underlying attribute. However, this fails when attributes are correlated. We solve this problem by enforcing independence between subspaces conditioned on the available attributes, which allows us to remove only dependencies that are not due to the correlation structure present in the training data. We achieve this via an adversarial approach to minimize the conditional mutual information (CMI) between subspaces with respect to categorical variables. We first show theoretically that CMI minimization is a good objective for robust disentanglement on linear problems with Gaussian data. We then apply our method on real-world datasets based on MNIST and CelebA, and show that it yields models that are disentangled and robust under correlation shift, including in weakly supervised settings.
翻译:在现实世界的数据中,差异因素之间的相关性是普遍存在的。机器学习算法可能从利用这种关联中受益,因为它们可以提高噪音数据的预测性能。然而,这种关联往往并不牢固(例如,它们可能改变域、数据集或应用程序之间),我们希望避免利用这些关联。分解方法旨在了解反映潜在亚空间差异的不同因素的表达方式。一个共同的方法是最大限度地减少潜伏子空间之间的相互信息,使每个潜伏子空间都编码一个单一的内在属性。然而,当属性相互关联时,这失败了。我们通过在可用属性的基础上加强子空间之间的独立来解决这个问题,这使我们能够消除并非由于培训数据中存在的相关性结构而产生的依赖性。我们通过对立方法来达到这一点,以尽量减少潜在亚空间之间与绝对变量之间的有条件的相互信息。我们首先从理论上表明,CMI最小化是一个很好的目标,以便在直线性数据中牢固地分解一个问题。我们随后将我们的方法应用于以现有属性为条件的子空间之间的独立性,从而使我们能够消除仅因培训数据中存在的关联性结构而导致的相互依存性。我们通过一种对抗性最小化的方法来达到这一点,在MIS和CLIST和CeebA的模型下显示其不稳的不稳性变化。