神经图形的硬件加速 (Hardware Acceleration of Neural Graphics)

Rendering and inverse-rendering algorithms that drive conventional computer graphics have recently been superseded by neural representations (NR). NRs have recently been used to learn the geometric and the material properties of the scenes and use the information to synthesize photorealistic imagery, thereby promising a replacement for traditional rendering algorithms with scalable quality and predictable performance. In this work we ask the question: Does neural graphics (NG) need hardware support? We studied representative NG applications showing that, if we want to render 4k res. at 60FPS there is a gap of 1.5X-55X in the desired performance on current GPUs. For AR/VR applications, there is an even larger gap of 2-4 OOM between the desired performance and the required system power. We identify that the input encoding and the MLP kernels are the performance bottlenecks, consuming 72%,60% and 59% of application time for multi res. hashgrid, multi res. densegrid and low res. densegrid encodings, respectively. We propose a NG processing cluster, a scalable and flexible hardware architecture that directly accelerates the input encoding and MLP kernels through dedicated engines and supports a wide range of NG applications. We also accelerate the rest of the kernels by fusing them together in Vulkan, which leads to 9.94X kernel-level performance improvement compared to un-fused implementation of the pre-processing and the post-processing kernels. Our results show that, NGPC gives up to 58X end-to-end application-level performance improvement, for multi res. hashgrid encoding on average across the four NG applications, the performance benefits are 12X,20X,33X and 39X for the scaling factor of 8,16,32 and 64, respectively. Our results show that with multi res. hashgrid encoding, NGPC enables the rendering of 4k res. at 30FPS for NeRF and 8k res. at 120FPS for all our other NG applications.

翻译：渲染和反渲染算法是传统计算机图形学的驱动力，但它们最近已经被神经表示(NR)所取代。NR在学习场景的几何和材质属性及使用信息合成逼真图像方面最近使用得越来越多，从而具有取代传统渲染算法的可扩展性和可预测性能的潜力。在这项工作中，我们提出一个问题：神经图形(NG)需要硬件支持吗？我们以代表性的NG应用为例进行研究，发现如果我们想要以60FPS渲染4k分辨率，那么在目前的GPU上所需的性能有1.5X-55X的差距。对于AR/VR应用，其所需要的系统功率与所需性能之间甚至存在2-4个阶的数量级差距。我们确定了输入编码和MLP核是性能瓶颈，对于多重分辨率哈希网格、多重分辨率致密网格和低分辨率致密网格编码，它们消耗了应用时间的72%、60%和59%。我们提出了一个NG处理集群(NGPC)，这是一个可扩展和灵活的硬件架构，通过专用引擎直接加速输入编码和MLP核，并支持各种NG应用。我们还通过在Vulkan中将其融合在一起来加速其余内核，这导致与未融合的预处理和后处理内核的实现相比，内核级性能提高了9.94倍。我们的结果表明，NGPC可提供高达58倍的应用级性能提升，对于在四个NG应用中多重分辨率哈希网格编码，平均性能提升是12倍、20倍、33倍和39倍，对于缩放因子为8、16、32和64。我们的结果表明，使用多重分辨率哈希格编码，NGPC可以实现NeRF以30FPS的速度渲染4k分辨率和其他所有NG应用以120FPS的速度渲染8k分辨率。