Rendering and inverse-rendering algorithms that drive conventional computer graphics have recently been superseded by neural representations (NR). NRs have recently been used to learn the geometric and the material properties of the scenes and use the information to synthesize photorealistic imagery, thereby promising a replacement for traditional rendering algorithms with scalable quality and predictable performance. In this work we ask the question: Does neural graphics (NG) need hardware support? We studied representative NG applications showing that, if we want to render 4k res. at 60FPS there is a gap of 1.5X-55X in the desired performance on current GPUs. For AR/VR applications, there is an even larger gap of 2-4 OOM between the desired performance and the required system power. We identify that the input encoding and the MLP kernels are the performance bottlenecks, consuming 72%,60% and 59% of application time for multi res. hashgrid, multi res. densegrid and low res. densegrid encodings, respectively. We propose a NG processing cluster, a scalable and flexible hardware architecture that directly accelerates the input encoding and MLP kernels through dedicated engines and supports a wide range of NG applications. We also accelerate the rest of the kernels by fusing them together in Vulkan, which leads to 9.94X kernel-level performance improvement compared to un-fused implementation of the pre-processing and the post-processing kernels. Our results show that, NGPC gives up to 58X end-to-end application-level performance improvement, for multi res. hashgrid encoding on average across the four NG applications, the performance benefits are 12X,20X,33X and 39X for the scaling factor of 8,16,32 and 64, respectively. Our results show that with multi res. hashgrid encoding, NGPC enables the rendering of 4k res. at 30FPS for NeRF and 8k res. at 120FPS for all our other NG applications.
翻译:驱动常规计算机图形的20进化算法和反反反演算法最近已被神经显示(NR)取代。最近,NR用于学习场景的几何和物质属性,并使用信息合成摄影现实图像,从而有望取代传统转换算法,其质量可缩放,性能可预见。在这项工作中,我们提出一个问题:神经图形(NG)是否需要硬件支持?我们研究了具有代表性的NG应用程序显示,如果我们想要在60FPS上提供4k RS。在目前GPS的预期性能中存在1.5X-55x的缺口。对于 AR/VR 应用程序来说,理想性能和所需的系统功率之间甚至还有2-4OOM的更大差距。我们发现,输入编码和MP内核元值是性能瓶颈,消耗72%的72%和59%的应用程序应用时间。 电离子电离子电解析法的多式、多式电离子和低RDRF.</s>