Real-world objects perform complex motions that involve multiple independent motion components. For example, while talking, a person continuously changes their expressions, head, and body pose. In this work, we propose a novel method to decompose motion in videos by using a pretrained image GAN model. We discover disentangled motion subspaces in the latent space of widely used style-based GAN models that are semantically meaningful and control a single explainable motion component. The proposed method uses only a few $(\approx10)$ ground truth video sequences to obtain such subspaces. We extensively evaluate the disentanglement properties of motion subspaces on face and car datasets, quantitatively and qualitatively. Further, we present results for multiple downstream tasks such as motion editing, and selective motion transfer, e.g. transferring only facial expressions without training for it.
翻译:暂无翻译