We propose $\textbf{VidStyleODE}$, a spatiotemporally continuous disentangled $\textbf{Vid}$eo representation based upon $\textbf{Style}$GAN and Neural-$\textbf{ODE}$s. Effective traversal of the latent space learned by Generative Adversarial Networks (GANs) has been the basis for recent breakthroughs in image editing. However, the applicability of such advancements to the video domain has been hindered by the difficulty of representing and controlling videos in the latent space of GANs. In particular, videos are composed of content (i.e., appearance) and complex motion components that require a special mechanism to disentangle and control. To achieve this, VidStyleODE encodes the video content in a pre-trained StyleGAN $\mathcal{W}_+$ space and benefits from a latent ODE component to summarize the spatiotemporal dynamics of the input video. Our novel continuous video generation process then combines the two to generate high-quality and temporally consistent videos with varying frame rates. We show that our proposed method enables a variety of applications on real videos: text-guided appearance manipulation, motion manipulation, image animation, and video interpolation and extrapolation. Project website: https://cyberiada.github.io/VidStyleODE
翻译:我们提出了$\textbf{VidStyleODE}$,这是一种基于 $\textbf{StyleGAN}$ 和 $\textbf{神经ODE}$ 的解耦$\textbf{视频}$表示。最近GAN学习中的潜空间遍历已经成为图像编辑的突破口。然而,这种先进技术在视频领域的适用性受到了表示视频和控制视频 GAN 潜空间的困难的影响。尤其是,视频由内容(即外观)和复杂的运动部分组成,需要特殊的机制来解耦和控制。 为了实现这一点, VidStyleODE 在经过预训练的 StyleGAN $\mathcal{W}_+$ 空间中对视频内容进行编码,并受益于 ODE 组件来总结输入视频的时空动态。我们提出的新型连续视频生成过程将两者组合起来,生成具有不同帧率的高质量且具有时间一致性的视频。 我们展示了我们提出的方法在实际视频上实现了各种应用:文本导向外观操作、运动操作、图像动画和视频插值和外推。 项目网站: https://cyberiada.github.io/VidStyleODE