Most Neural Radiance Fields (NeRFs) have poor generalization ability, limiting their application when representing multiple scenes by a single model. To ameliorate this problem, existing methods simply condition NeRF models on image features, lacking the global understanding and modeling of the entire 3D scene. Inspired by the significant success of mask-based modeling in other research fields, we propose a masked ray and view modeling method for generalizable NeRF (MRVM-NeRF), the first attempt to incorporate mask-based pretraining into 3D implicit representations. Specifically, considering that the core of NeRFs lies in modeling 3D representations along the rays and across the views, we randomly mask a proportion of sampled points along the ray at fine stage by discarding partial information obtained from multi-viewpoints, targeting at predicting the corresponding features produced in the coarse branch. In this way, the learned prior knowledge of 3D scenes during pretraining helps the model generalize better to novel scenarios after finetuning. Extensive experiments demonstrate the superiority of our proposed MRVM-NeRF under various synthetic and real-world settings, both qualitatively and quantitatively. Our empirical studies reveal the effectiveness of our proposed innovative MRVM which is specifically designed for NeRF models.
翻译:神经辐射场(NeRF)往往具有较差的泛化能力,这限制了它们在使用单个模型表示多个场景时的应用。为了改善这个问题,现有的方法仅将NeRF模型条件化为图像特征,缺乏对整个3D场景的全局理解和建模。受到其他研究领域中掩码建模的显著成功启发,我们提出了一种掩码光线和视图建模方法用于具有普适性的NeRF(MRVM-NeRF),这是将基于掩码的预训练纳入3D隐式表示的首次尝试。具体而言,考虑到NeRF的核心在于沿光线和跨视图对3D表示进行建模,我们在细节阶段随机遮挡光线上取样的一定比例的点,通过舍弃从多视角获取的部分信息,以预测粗糙分支中产生的相应特征。这样,预训练期间学习的3D场景的先验知识有助于模型在微调后更好地推广到新的情景中。广泛的实验在不同的合成和实际场景中展示了我们提出的MRVM-NeRF的优越性,无论是质量还是量化方面。我们的实证研究揭示了我们专门为NeRF模型设计的创新MRVM的有效性。