In this paper, we propose an approach for view-time interpolation of stereo videos. Specifically, we build upon X-Fields that approximates an interpolatable mapping between the input coordinates and 2D RGB images using a convolutional decoder. Our main contribution is to analyze and identify the sources of the problems with using X-Fields in our application and propose novel techniques to overcome these challenges. Specifically, we observe that X-Fields struggles to implicitly interpolate the disparities for large baseline cameras. Therefore, we propose multi-plane disparities to reduce the spatial distance of the objects in the stereo views. Moreover, we propose non-uniform time coordinates to handle the non-linear and sudden motion spikes in videos. We additionally introduce several simple, but important, improvements over X-Fields. We demonstrate that our approach is able to produce better results than the state of the art, while running in near real-time rates and having low memory and storage costs.
翻译:在本文中,我们提出了一种视图时间插值立体视频的方法。具体来说,我们建立在X-Fields的基础上,它使用卷积解码器近似可插值的输入坐标和2D RGB图像之间的映射。我们的主要贡献是分析并确定了在我们的应用中使用X-Fields的问题来源,并提出了克服这些挑战的新技术。具体来说,我们观察到X-Fields在大基线相机的情况下难以隐式插值视差。因此,我们提出了多平面视差来减少立体视图中物体的空间距离。此外,我们提出了非均匀时间坐标来处理视频中的非线性和突然运动峰值。我们还引入了几个简单但重要的对X-Fields的改进。我们证明了我们的方法能够以接近实时的速率产生比现有技术更好的结果,并具有较低的内存和存储成本。