硬件加速神经图形 (Hardware Acceleration of Neural Graphics)

Rendering and inverse-rendering algorithms that drive conventional computer graphics have recently been superseded by neural representations (NR). NRs have recently been used to learn the geometric and the material properties of the scenes and use the information to synthesize photorealistic imagery, thereby promising a replacement for traditional rendering algorithms with scalable quality and predictable performance. In this work we ask the question: Does neural graphics (NG) need hardware support? We studied representative NG applications showing that, if we want to render 4k res. at 60FPS there is a gap of 1.5X-55X in the desired performance on current GPUs. For AR/VR applications, there is an even larger gap of 2-4 OOM between the desired performance and the required system power. We identify that the input encoding and the MLP kernels are the performance bottlenecks, consuming 72%,60% and 59% of application time for multi res. hashgrid, multi res. densegrid and low res. densegrid encodings, respectively. We propose a NG processing cluster, a scalable and flexible hardware architecture that directly accelerates the input encoding and MLP kernels through dedicated engines and supports a wide range of NG applications. We also accelerate the rest of the kernels by fusing them together in Vulkan, which leads to 9.94X kernel-level performance improvement compared to un-fused implementation of the pre-processing and the post-processing kernels. Our results show that, NGPC gives up to 58X end-to-end application-level performance improvement, for multi res. hashgrid encoding on average across the four NG applications, the performance benefits are 12X,20X,33X and 39X for the scaling factor of 8,16,32 and 64, respectively. Our results show that with multi res. hashgrid encoding, NGPC enables the rendering of 4k res. at 30FPS for NeRF and 8k res. at 120FPS for all our other NG applications.

翻译：最近，传统的计算机图形驱动的渲染和逆渲染算法被神经表示（NR）所取代。NR最近已被用于学习场景的几何和材料属性，并利用这些信息来合成逼真的图像，从而承诺取代传统的渲染算法，具有可扩展的质量和可预测的性能。在这项工作中，我们提出了一个问题：神经图形（NG）是否需要硬件支持？我们研究了代表性的NG应用，显示出如果我们希望以60FPS渲染4k分辨率，则当前GPU所需的性能与期望性能之间存在1.5倍至55倍的差距。对于AR / VR应用程序，所需性能与所需系统功率之间甚至存在2-4 OOM的巨大差距。我们确定输入编码和MLP内核是性能瓶颈，对于多层哈希网格，多层密集网格和低分辨率密集网格编码，它们分别消耗应用时间的72％，60％和59％。我们提出了NG处理集群，这是一种可扩展且灵活的硬件架构，通过专用引擎直接加速输入编码和MLP内核，并支持各种NG应用程序。我们还通过将它们在Vulkan中融合在一起来加速其余内核，这导致相对于预处理和后处理内核的未融合实现，核心级性能改进了9.94倍。我们的结果表明，NGPC在四个NG应用程序中为多层哈希网格编码平均提供了高达58倍的端到端应用级性能改进，对于缩放因子为8、16、32和64，性能好处分别为12倍，20倍，33倍和39倍。我们的结果表明，在多层哈希网格编码下，NGPC使得NeRF在4k分辨率下以30FPS的速度渲染，且同时能在所有其他NG应用程序下以8k分辨率达到120FPS的速度。