In this paper, we propose to model the video dynamics by learning the trajectory of independently inverted latent codes from GANs. The entire sequence is seen as discrete-time observations of a continuous trajectory of the initial latent code, by considering each latent code as a moving particle and the latent space as a high-dimensional dynamic system. The latent codes representing different frames are therefore reformulated as state transitions of the initial frame, which can be modeled by neural ordinary differential equations. The learned continuous trajectory allows us to perform infinite frame interpolation and consistent video manipulation. The latter task is reintroduced for video editing with the advantage of requiring the core operations to be applied to the first frame only while maintaining temporal consistency across all frames. Extensive experiments demonstrate that our method achieves state-of-the-art performance but with much less computation.
翻译:在本文中,我们建议通过学习从 GANs 中独立反向潜伏代码的轨迹来模拟视频动态。 整个序列被视为对初始潜伏代码连续轨迹的离散时间观测, 将每个潜伏代码视为移动粒子, 将潜伏空间视为高维动态系统。 因此, 代表不同框架的潜伏代码被重塑为初始框架的状态转换, 初始框架可以通过神经普通差异方程式进行建模。 学习的连续轨迹允许我们进行无限框架内插和连贯的视频操纵。 后一项任务被重新引入视频编辑, 其优点是要求核心操作仅应用于第一个框架, 同时保持所有框架的时间一致性 。 广泛的实验表明, 我们的方法实现了最先进的性能, 但却远没有进行计算 。