In this paper we focus on landscape animation, which aims to generate time-lapse videos from a single landscape image. Motion is crucial for landscape animation as it determines how objects move in videos. Existing methods are able to generate appealing videos by learning motion from real time-lapse videos. However, current methods suffer from inaccurate motion generation, which leads to unrealistic video results. To tackle this problem, we propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding for Landscape Animation. Our model consists of two parts: (1) a motion encoder which embeds time-lapse motion in a fine-grained way. (2) a motion generator which generates realistic motion to animate input images. To train and evaluate on diverse time-lapse videos, we build the largest high-resolution Time-lapse video dataset with Diverse scenes, namely Time-lapse-D, which includes 16,874 video clips with over 10 million frames. Quantitative and qualitative experimental results demonstrate the superiority of our method. In particular, our method achieves relative improvements by 19% on LIPIS and 5.6% on FVD compared with state-of-the-art methods on our dataset. A user study carried out with 700 human subjects shows that our approach visually outperforms existing methods by a large margin.
翻译:在本文中,我们侧重于景观动画,目的是从单一景观图像中生成时间折叠视频。运动对于景观动画至关重要,因为它决定了物体如何移动视频。现有方法能够通过从实时折叠视频中学习动作来生成吸引人的视频。然而,目前的方法受到不准确的动作生成的影响,这会导致不现实的视频结果。为了解决这个问题,我们提出了一个名为 FGLA 的模型,通过学习为景观动画嵌入的精细动作来生成高质量和现实的视频。我们的模型由两部分组成:(1) 将时间折叠运动嵌成精细的动作编码器。(2) 运动生成器,通过从实时折叠视频中学习运动来产生现实的动作。但是,为了培训和评估不同的时间折叠视频,我们用不同镜头,即时间折叠D来构建最大的高清晰时间折叠动视频数据集,其中包括16,874个视频剪辑,超过1 000万个框架。定量和定性实验结果显示了我们的方法的优越性。特别是,我们的方法使LIPIS和700.6%的动作生成动动动动动画图像,我们用大型的AVFV模型,用大型数据演示了我们现有的方法,展示了现有的模型,用大的方法,用A-FVFV的平位模型演示了现有的方法,用大的方法展示了。