Although many recent works have investigated generalizable NeRF-based novel view synthesis for unseen scenes, they seldom consider the synthetic-to-real generalization, which is desired in many practical applications. In this work, we first investigate the effects of synthetic data in synthetic-to-real novel view synthesis and surprisingly observe that models trained with synthetic data tend to produce sharper but less accurate volume densities. For pixels where the volume densities are correct, fine-grained details will be obtained. Otherwise, severe artifacts will be produced. To maintain the advantages of using synthetic data while avoiding its negative effects, we propose to introduce geometry-aware contrastive learning to learn multi-view consistent features with geometric constraints. Meanwhile, we adopt cross-view attention to further enhance the geometry perception of features by querying features across input views. Experiments demonstrate that under the synthetic-to-real setting, our method can render images with higher quality and better fine-grained details, outperforming existing generalizable novel view synthesis methods in terms of PSNR, SSIM, and LPIPS. When trained on real data, our method also achieves state-of-the-art results.
翻译:虽然最近许多研究都探讨了针对未知场景的通用NeRF(神经辐射场)新颖视图合成,但它们很少考虑合成到实际的泛化性,这在许多实际应用中是必要的。在本研究中,我们首先研究了合成数据在合成到实际的新颖视图合成中的效果,令人惊讶的是发现,使用合成数据训练的模型倾向于产生更锐利但不够准确的体密度。对于体密度正确的像素,将获得细致的细节。否则,将产生严重的伪影。为了保持使用合成数据的优势,同时避免其负面影响,我们建议引入几何感知的对比学习,以学习带有几何约束的多视角一致的特征。同时,我们采用跨视图注意力进一步增强特征的几何感知,通过查询跨输入视图的特征。实验证明,在合成到实际情况下,我们的方法可以产生更高质量、更好的细节的图像,并在PSNR、SSIM和LPIPS方面优于现有的通用新颖视图合成方法。当在真实数据上进行训练时,我们的方法也达到了最先进的结果。