We present Worldsheet, a method for novel view synthesis using just a single RGB image as input. The main insight is that simply shrink-wrapping a planar mesh sheet onto the input image, consistent with the learned intermediate depth, captures underlying geometry sufficient to generate photorealistic unseen views with large viewpoint changes. To operationalize this, we propose a novel differentiable texture sampler that allows our wrapped mesh sheet to be textured and rendered differentiably into an image from a target viewpoint. Our approach is category-agnostic, end-to-end trainable without using any 3D supervision, and requires a single image at test time. We also explore a simple extension by stacking multiple layers of Worldsheets to better handle occlusions. Worldsheet consistently outperforms prior state-of-the-art methods on single-image view synthesis across several datasets. Furthermore, this simple idea captures novel views surprisingly well on a wide range of high-resolution in-the-wild images, converting them into navigable 3D pop-ups. Video results and code are available at https://worldsheet.github.io.
翻译:我们展示了“Worldsheet ” (Worldshet), 这是一种使用单一 RGB 图像进行新颖的视图合成的方法。 主要的洞察力是, 简单地将一个平面网状网格布在输入图像上进行压缩包装, 与所学的中间深度一致, 捕捉基本几何学, 足以产生光学现实的无形观点, 并带来巨大的视觉变化。 为了操作这个方法, 我们提议了一个新颖的、 不同的纹理取样器, 使得我们包装的网状网格能够从目标角度对一个图像进行纹理化, 并且将它变成不同图像。 我们的方法是分类、 终端到终端, 无需使用任何 3D 督导, 而在测试时需要一个单一图像 。 我们还通过堆放多层世界图表来探索一个简单的扩展范围, 来更好地处理隐蔽。 Worldsheshits 持续超越了先前在多个数据集中进行单一图像合成的最新方法 。 此外, 这个简单的想法可以捕捉到大量高分辨率图像的新观点, 令人吃惊,,,, 将它们转换成3D 流行的3D 跳式 。