A core challenge in Machine Learning is to learn to disentangle natural factors of variation in data (e.g. object shape vs. pose). A popular approach to disentanglement consists in learning to map each of these factors to distinct subspaces of a model's latent representation. However, this approach has shown limited empirical success to date. Here, we show that, for a broad family of transformations acting on images--encompassing simple affine transformations such as rotations and translations--this approach to disentanglement introduces topological defects (i.e. discontinuities in the encoder). Motivated by classical results from group representation theory, we study an alternative, more flexible approach to disentanglement which relies on distributed latent operators, potentially acting on the entire latent space. We theoretically and empirically demonstrate the effectiveness of this approach to disentangle affine transformations. Our work lays a theoretical foundation for the recent success of a new generation of models using distributed operators for disentanglement.
翻译:机器学习的核心挑战是学会解开数据变化的自然因素(例如,物体形状与成形)。一种流行的分解方法包括学习将其中每一种因素映射成模型潜在代表的分空间。然而,迄今为止,这一方法所显示的经验成功有限。在这里,我们表明,对于一个广泛的转变大家庭来说,在图像包罗的简单直系变形上采取行动,如旋转和翻译——这种分解方法带来了地形上的缺陷(例如,变形中的不连续性)。在群体代表理论的经典结果的驱使下,我们研究一种替代的、更灵活的方法解析,它依赖于分布式潜在操作者,有可能在整个潜在空间上采取行动。我们从理论上和实验上展示了这一方法的效能,以解开折形变形变形变形。我们的工作为最近利用分布式变形操作者解形的新一代模型的成功奠定了理论基础。