We address the problem of novel view video prediction; given a set of input video clips from a single/multiple views, our network is able to predict the video from a novel view. The proposed approach does not require any priors and is able to predict the video from wider angular distances, upto 45 degree, as compared to the recent studies predicting small variations in viewpoint. Moreover, our method relies only onRGB frames to learn a dual representation which is used to generate the video from a novel viewpoint. The dual representation encompasses a view-dependent and a global representation which incorporates complementary details to enable novel view video prediction. We demonstrate the effectiveness of our framework on two real world datasets: NTU-RGB+D and CMU Panoptic. A comparison with the State-of-the-art novel view video prediction methods shows an improvement of 26.1% in SSIM, 13.6% in PSNR, and 60% inFVD scores without using explicit priors from target views.
翻译:我们处理的是新视觉视频预测问题;根据一组单一/多种观点的输入视频剪辑,我们的网络能够从新观点中预测视频。拟议方法不需要任何前科,能够从更宽的角距离预测视频,与最近预测观点小变的研究相比,高达45度。此外,我们的方法仅依靠RGB框架来学习一种双向表达方式,用新观点生成视频。双向表达方式包括一种以视图为依存的和全球性的表达方式,其中包含补充细节,以便能够进行新的视觉视频预测。我们展示了我们在两个真实世界数据集:NTU-RGB+D和CMU Panphic 上的框架的有效性。与最新新视觉预测方法的对比显示,在SIM中,26.1%、PSNR中13.6%和FVD中60%的分数,而没有使用目标观点的明确前科。