Representing visual signals by implicit representation (e.g., a coordinate based deep network) has prevailed among many vision tasks. This work explores a new intriguing direction: training a stylized implicit representation, using a generalized approach that can apply to various 2D and 3D scenarios. We conduct a pilot study on a variety of implicit functions, including 2D coordinate-based representation, neural radiance field, and signed distance function. Our solution is a Unified Implicit Neural Stylization framework, dubbed INS. In contrary to vanilla implicit representation, INS decouples the ordinary implicit function into a style implicit module and a content implicit module, in order to separately encode the representations from the style image and input scenes. An amalgamation module is then applied to aggregate these information and synthesize the stylized output. To regularize the geometry in 3D scenes, we propose a novel self-distillation geometry consistency loss which preserves the geometry fidelity of the stylized scenes. Comprehensive experiments are conducted on multiple task settings, including novel view synthesis of complex scenes, stylization for implicit surfaces, and fitting images using MLPs. We further demonstrate that the learned representation is continuous not only spatially but also style-wise, leading to effortlessly interpolating between different styles and generating images with new mixed styles. Please refer to the video on our project page for more view synthesis results: https://zhiwenfan.github.io/INS.
翻译:以隐含表示方式代表视觉信号( 例如, 一个基于坐标的深网络) 在许多视觉任务中占上风。 这项工作探索了一个新的引人入胜的方向: 使用适用于各种 2D 和 3D 情景的通用方法, 培训一个标准化的隐含表示方式; 我们对各种隐含功能进行试点研究, 包括 2D 协调代表方式、 神经光亮场和签名的远程功能。 我们的解决方案是一个统一的隐含神经立体框架, 称为 INS 。 与香草隐含代表方式相反, INS 将普通隐含的功能分解成一个风格的隐含模块和内容隐含模块, 以便从样式图像和输入场景中单独编码代表的表示方式。 然后, 将一个合并模块用于汇总这些信息, 并合成出各种隐含式的输出结果。 为了规范3D 场景中的地理测量方法, 我们提议一个全新的自我蒸馏地理测量一致性损失, 以保存我们新的图像的几何精确性准确性。 在多个任务设置上进行了全面实验, 包括复杂图像的新视图的合成合成, Stylb/ develilizalizalization 。