Classical light field rendering for novel view synthesis can accurately reproduce view-dependent effects such as reflection, refraction, and translucency, but requires a dense view sampling of the scene. Methods based on geometric reconstruction need only sparse views, but cannot accurately model non-Lambertian effects. We introduce a model that combines the strengths and mitigates the limitations of these two directions. By operating on a four-dimensional representation of the light field, our model learns to represent view-dependent effects accurately. By enforcing geometric constraints during training and inference, the scene geometry is implicitly learned from a sparse set of views. Concretely, we introduce a two-stage transformer-based model that first aggregates features along epipolar lines, then aggregates features along reference views to produce the color of a target ray. Our model outperforms the state-of-the-art on multiple forward-facing and 360{\deg} datasets, with larger margins on scenes with severe view-dependent variations.
翻译:用于新观点合成的古典光光场可以准确地复制反射、折射和半透明等视依赖效应,但需要对场景进行密集的浏览抽样。基于几何重建的方法只需要很少的视图,但不能准确地模拟非蓝贝效应。 我们引入了一种模型,将这两个方向的优点结合起来,减轻这两个方向的局限性。 通过对光场进行四维的演示,我们的模型学会准确地代表视依赖效应。 通过在培训和推断期间执行几何限制,现场几何测量从零星的一组观点中隐含地学习。 具体而言,我们引入了一种基于两阶段的变异器模型,先在上极线上进行综合,然后在参考视图上集成特征,以产生目标射线的颜色。我们的模型在多个远光场和360 deg}的数据集上超越了艺术的状态,在屏幕上加大边距,并有严重的视觉差异。