Upsampling videos of human activity is an interesting yet challenging task with many potential applications ranging from gaming to entertainment and sports broadcasting. The main difficulty in synthesizing video frames in this setting stems from the highly complex and non-linear nature of human motion and the complex appearance and texture of the body. We propose to address these issues in a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance. A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset (AMASS). The high-frame-rate pose predictions are then used by a neural rendering pipeline to produce the full-frame output, taking the pose and background consistency into consideration. Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training. Furthermore, we contribute the first evaluation dataset that consists of high-quality and high-frame-rate videos of human activities for this task. Compared with state-of-the-art video interpolation techniques, our method produces in-between frames with better quality and accuracy, which is evident by state-of-the-art results on pixel-level, distributional metrics and comparative user evaluations. Our code and the collected dataset are available at https://git.io/Render-In-Between.
翻译:模拟人类活动视频是一个有趣而富有挑战性的任务,有许多潜在应用,从游戏到娱乐和体育广播,其中有许多潜在应用,从游戏到娱乐和体育广播,综合视频框架的主要困难在于人的运动高度复杂和非线性,以及身体的复杂外观和质地。我们提议在运动指导的立体示范框架框架内解决这些问题,这种框架能够产生现实的人类运动和外观。新颖的运动模型通过利用大型运动囊括数据集(AMASS)来推断框架之间的非线性骨骼运动。然后,高框架率构成的预测被神经导线用于生成全框架输出,同时考虑其外观和背景的一致性。我们建议,我们的管道只需要低框架视频和未受控的人类运动数据,但并不需要高框架的视频来进行培训。此外,我们提供了第一个由高质量和高框架的人类活动视频组成的评价数据集。 将高框架高框架和高框架的图像结构的预测作为比较,然后由神经导管用于生成全框架产出,同时考虑到外观和背景的一致性和背景一致性。我们收集的系统方法,在州/框架之间的比较性数据中,我们收集了高框架和清晰度数据。