通过 3D 利用多行星图像对物体动作估计进行动态场景的时间视图合成 (Temporal View Synthesis of Dynamic Scenes through 3D Object Motion Estimation with Multi-Plane Images)

The challenge of graphically rendering high frame-rate videos on low compute devices can be addressed through periodic prediction of future frames to enhance the user experience in virtual reality applications. This is studied through the problem of temporal view synthesis (TVS), where the goal is to predict the next frames of a video given the previous frames and the head poses of the previous and the next frames. In this work, we consider the TVS of dynamic scenes in which both the user and objects are moving. We design a framework that decouples the motion into user and object motion to effectively use the available user motion while predicting the next frames. We predict the motion of objects by isolating and estimating the 3D object motion in the past frames and then extrapolating it. We employ multi-plane images (MPI) as a 3D representation of the scenes and model the object motion as the 3D displacement between the corresponding points in the MPI representation. In order to handle the sparsity in MPIs while estimating the motion, we incorporate partial convolutions and masked correlation layers to estimate corresponding points. The predicted object motion is then integrated with the given user or camera motion to generate the next frame. Using a disocclusion infilling module, we synthesize the regions uncovered due to the camera and object motion. We develop a new synthetic dataset for TVS of dynamic scenes consisting of 800 videos at full HD resolution. We show through experiments on our dataset and the MPI Sintel dataset that our model outperforms all the competing methods in the literature.

翻译：通过定期预测未来框架,提高虚拟现实应用程序中的用户经验,从而提高用户在虚拟现实应用方面的经验。通过时间视图合成(TVS)问题,研究这一问题,目标是预测一个视频的下一个框架,根据以前的框架以及上一个和下一个框架的头部配置。在这项工作中,我们考虑用户和对象移动的动态场景的TVS。我们设计一个框架,在预测下一个框架的同时,将运动分解为用户和物体运动,以便有效地使用可用的用户运动。我们预测对象运动的动向,方法是将过去框架中的3D对象运动分离和估计出来,然后将其外推。我们用多平台图像作为场景的3D表示,并将物体运动作为3D运动在MPI代表中相应的点之间的迁移模型。为了在估计运动的同时处理MPI的紧张性,我们结合了部分演进和遮蔽的相层来估计相应的点。然后,我们预测对象运动运动的动作与给定的用户或相机动作运动运动运动的动作结合,然后推断出一个动态图像模型。我们用一个图像模型来模拟,通过我们的数据模型来模拟模型来制作新的动态模型。我们的数据模型,然后的合成模型,我们用一个模拟模型来模拟模型来制作一个数据模型来模拟。