In this paper we show how to perform scene-level inverse rendering to recover shape, reflectance and lighting from a single, uncontrolled image using a fully convolutional neural network. The network takes an RGB image as input, regresses albedo, shadow and normal maps from which we infer least squares optimal spherical harmonic lighting coefficients. Our network is trained using large uncontrolled multiview and timelapse image collections without ground truth. By incorporating a differentiable renderer, our network can learn from self-supervision. Since the problem is ill-posed we introduce additional supervision. Our key insight is to perform offline multiview stereo (MVS) on images containing rich illumination variation. From the MVS pose and depth maps, we can cross project between overlapping views such that Siamese training can be used to ensure consistent estimation of photometric invariants. MVS depth also provides direct coarse supervision for normal map estimation. We believe this is the first attempt to use MVS supervision for learning inverse rendering. In addition, we learn a statistical natural illumination prior. We evaluate performance on inverse rendering, normal map estimation and intrinsic image decomposition benchmarks.
翻译:在本文中,我们展示了如何使用完全进化神经网络从单一、不受控制的图像中进行场景水平反向转换以恢复形状、反射和照明,以恢复形状、反射和照明。网络将RGB图像作为输入、反反射反射反射、阴影和普通地图,我们从中推断出最小正方方最佳球体调光光系数。我们的网络是用大型不受控制的多视图和时间折射图像收集来训练的,而没有地面真相。通过纳入一个不同的翻版,我们的网络可以从自我监督中学习。由于问题存在,我们引入了额外的监督。我们的关键洞察力是就含有丰富光化差异的图像进行离线多视立体(MVS) 。从MVS 的外观和深度地图上,我们可以在重叠的视图之间进行交叉工程,这样就可以使用暹粒训练来确保持续地估计变异体的光度。 MVS深度也提供了直接粗略的监控。我们认为这是第一次尝试使用MVS监督来进行反向反向测量。此外,我们还要学习统计自然污染。此外,我们之前还学会了一种统计上的自然错判。我们评估。我们先进行。我们评估。我们之前要对地图进行反判。我们评估。我们是如何判。我们评估。我们之前要对地图图图图图图图图基准进行反判。