The movie and video game industries have adopted photogrammetry as a way to create digital 3D assets from multiple photographs of a real-world scene. But photogrammetry algorithms typically output an RGB texture atlas of the scene that only serves as visual guidance for skilled artists to create material maps suitable for physically-based rendering. We present a learning-based approach that automatically produces digital assets ready for physically-based rendering, by estimating approximate material maps from multi-view captures of indoor scenes that are used with retopologized geometry. We base our approach on a material estimation Convolutional Neural Network (CNN) that we execute on each input image. We leverage the view-dependent visual cues provided by the multiple observations of the scene by gathering, for each pixel of a given image, the color of the corresponding point in other images. This image-space CNN provides us with an ensemble of predictions, which we merge in texture space as the last step of our approach. Our results demonstrate that the recovered assets can be directly used for physically-based rendering and editing of real indoor scenes from any viewpoint and novel lighting. Our method generates approximate material maps in a fraction of time compared to the closest previous solutions.
翻译:电影和视频游戏产业采用了摄影测量法,作为从真实世界场景的多张照片中创建数字3D资产的一种方法。但是,摄影测量算法通常会输出一个 RGB 质谱图集,该图集仅作为熟练艺术家的视觉指导,用于制作适合物理图像的材料地图。我们展示了一种基于学习的方法,通过估算多视图采集室内场景的近似材料地图,并用于再演化几何,自动生成可供物理拍摄的数字化资产。我们的方法基于我们执行的每个输入图像的材料估计动态神经网络(CNN ) 。我们利用对场景进行多次观察后提供的视景提示,为每个特定图像的每个像素谱收集其他图像中相应点的颜色。这个图像空间提供了一系列预测,我们将其合并到质谱空间中,作为我们方法的最后一步。我们的结果表明,从任何视角和新颖的照明中,都可直接用于实际拍摄和编辑真实室内场景。我们的方法在最接近的时段中生成了最接近的图像。