The view synthesis problem--generating novel views of a scene from known imagery--has garnered recent attention due in part to compelling applications in virtual and augmented reality. In this paper, we explore an intriguing scenario for view synthesis: extrapolating views from imagery captured by narrow-baseline stereo cameras, including VR cameras and now-widespread dual-lens camera phones. We call this problem stereo magnification, and propose a learning framework that leverages a new layered representation that we call multiplane images (MPIs). Our method also uses a massive new data source for learning view extrapolation: online videos on YouTube. Using data mined from such videos, we train a deep network that predicts an MPI from an input stereo image pair. This inferred MPI can then be used to synthesize a range of novel views of the scene, including views that extrapolate significantly beyond the input baseline. We show that our method compares favorably with several recent view synthesis methods, and demonstrate applications in magnifying narrow-baseline stereo images.
翻译:在本文中,我们探索了一个令人感兴趣的综合情景:从狭窄的立体摄像机摄取的图像中推断观点,包括VR摄像机和现在宽广的双镜头照相机。我们称这个问题为立体放大,并提议一个学习框架,利用我们称之为多平面图像(MPIs)的新的分层图解。我们的方法还利用一个庞大的新数据源来学习外推:YouTube上的在线视频。我们利用从这些视频中提取的数据,培训一个深层次的网络,从输入立体图像配对的立体图像中预测一个MPI。这样推断的MPI可以用来综合一系列新的场景观点,包括大大超出输入基线范围的观点。我们显示我们的方法与最近的一些观点合成方法相比是优异的,并展示了放大窄基立体图像的应用。