Disentangling complex data to its latent factors of variation is a fundamental task in representation learning. Existing work on sequential disentanglement mostly provides two factor representations, i.e., it separates the data to time-varying and time-invariant factors. In contrast, we consider multifactor disentanglement in which multiple (more than two) semantic disentangled components are generated. Key to our approach is a strong inductive bias where we assume that the underlying dynamics can be represented linearly in the latent space. Under this assumption, it becomes natural to exploit the recently introduced Koopman autoencoder models. However, disentangled representations are not guaranteed in Koopman approaches, and thus we propose a novel spectral loss term which leads to structured Koopman matrices and disentanglement. Overall, we propose a simple and easy to code new deep model that is fully unsupervised and it supports multifactor disentanglement. We showcase new disentangling abilities such as swapping of individual static factors between characters, and an incremental swap of disentangled factors from the source to the target. Moreover, we evaluate our method extensively on two factor standard benchmark tasks where we significantly improve over competing unsupervised approaches, and we perform competitively in comparison to weakly- and self-supervised state-of-the-art approaches. The code is available at https://github.com/azencot-group/SKD.
翻译:将复杂数据分离为其潜在变化因素是表示学习中的一项基本任务。现有的序列分离方法主要提供两个因素的表示,即将数据分离为时变和时不变因素。相比之下,我们考虑多因素分离,即生成多个(超过两个)语义分离的组件。我们的方法的关键是一个强的归纳偏差,即假设潜在动态可以在潜在空间中呈线性表示。在这个假设下,自然而然地考虑到利用最近提出的 Koopman 自编码器模型。然而,Koopman 方法并不能保证分离的表示,因此我们提出了一种新的谱损失项,它导致了结构化的 Koopman 矩阵和分离表示。总体上,我们提出了一个简单易用、完全无监督且支持多因素分离的新深度模型。我们展示了新的分离能力,例如在人物之间交换单个静态因素,以及从源到目标逐步交换分离因素。此外,我们在两个标准基准任务上对该方法进行了广泛评估,在竞争性无监督方法上显着提升,与弱监督和自监督的最新方法相比具有竞争力。代码可在 https://github.com/azencot-group/SKD 上获得。