Videos depict the change of complex dynamical systems over time in the form of discrete image sequences. Generating controllable videos by learning the dynamical system is an important yet underexplored topic in the computer vision community. This paper presents a novel framework, TiV-ODE, to generate highly controllable videos from a static image and a text caption. Specifically, our framework leverages the ability of Neural Ordinary Differential Equations~(Neural ODEs) to represent complex dynamical systems as a set of nonlinear ordinary differential equations. The resulting framework is capable of generating videos with both desired dynamics and content. Experiments demonstrate the ability of the proposed method in generating highly controllable and visually consistent videos, and its capability of modeling dynamical systems. Overall, this work is a significant step towards developing advanced controllable video generation models that can handle complex and dynamic scenes.
翻译:视频以离散图像序列的形式描述复杂动态系统的长期变化。 通过学习动态系统生成可控视频是计算机视觉界一个重要但未得到充分探讨的专题。 本文展示了一个新的框架, 即 TiV- ODE, 以通过静态图像和文字字幕生成高度可控的视频。 具体地说, 我们的框架利用神经普通差异值( Neal DEPs) 的能力代表复杂的动态系统, 作为一组非线性普通差异方程式。 由此形成的框架能够生成具有所需动态和内容的视频。 实验显示了拟议方法生成高度可控和视觉一致的视频的能力及其建模动态系统的能力。 总体而言, 这项工作是朝着开发能够处理复杂和动态场景的高级可控视频生成模型迈出的重要一步。</s>