In this work, we aim to address the 3D scene stylization problem - generating stylized images of the scene at arbitrary novel view angles. A straightforward solution is to combine existing novel view synthesis and image/video style transfer approaches, which often leads to blurry results or inconsistent appearance. Inspired by the high quality results of the neural radiance fields (NeRF) method, we propose a joint framework to directly render novel views with the desired style. Our framework consists of two components: an implicit representation of the 3D scene with the neural radiance field model, and a hypernetwork to transfer the style information into the scene representation. In particular, our implicit representation model disentangles the scene into the geometry and appearance branches, and the hypernetwork learns to predict the parameters of the appearance branch from the reference style image. To alleviate the training difficulties and memory burden, we propose a two-stage training procedure and a patch sub-sampling approach to optimize the style and content losses with the neural radiance field model. After optimization, our model is able to render consistent novel views at arbitrary view angles with arbitrary style. Both quantitative evaluation and human subject study have demonstrated that the proposed method generates faithful stylization results with consistent appearance across different views.
翻译:在这项工作中,我们的目标是解决3D场景星际化问题,即以任意的新视角角度生成场景的星状图像。一个直截了当的解决办法是将现有的新颖视图合成和图像/视频风格传输方法结合起来,这些方法往往导致模糊结果或外观不一致。在神经光场方法(NERF)的高质量结果的启发下,我们建议了一个联合框架,以理想的风格直接提供新观点。我们的框架由两个组成部分组成:3D场与神经亮度场模型暗含的表示,以及将风格信息传输到场景代表的超网络。特别是,我们隐含的代表性模型将场景分解为几何和外观分支,而超大网络则学习从参考样式图像中预测外观分支的参数。为了减轻培训困难和记忆负担,我们提议了一个两阶段培训程序和一个补丁子抽样方法,用神经光场模型优化风格和内容损失。在优化后,我们的模型能够将任意视角与任意视角相容。两种定量评价和人类主题研究都展示了一致的外观。