Neural graphics primitives, parameterized by fully connected neural networks, can be costly to train and evaluate. We reduce this cost with a versatile new input encoding that permits the use of a smaller network without sacrificing quality, thus significantly reducing the number of floating point and memory access operations: a small neural network is augmented by a multiresolution hash table of trainable feature vectors whose values are optimized through stochastic gradient descent. The multiresolution structure allows the network to disambiguate hash collisions, making for a simple architecture that is trivial to parallelize on modern GPUs. We leverage this parallelism by implementing the whole system using fully-fused CUDA kernels with a focus on minimizing wasted bandwidth and compute operations. We achieve a combined speedup of several orders of magnitude, enabling training of high-quality neural graphics primitives in a matter of seconds, and rendering in tens of milliseconds at a resolution of ${1920\!\times\!1080}$.
翻译:由完全连通的神经网络参数化的神经图形原始体,在培训和评估方面成本很高。我们用一个多功能的新输入编码来降低这一成本,允许在不牺牲质量的情况下使用一个较小的网络,从而大大减少浮点和记忆存取操作的数量:一个小型神经网络通过一个多分辨率散列的可训练特性矢量表加以增强,这些可训练特性矢量的值通过悬浮梯度梯度下降得到优化。多分辨率结构使网络能够掩盖散列碰撞,从而形成一个简单的结构,而这种结构在现代GPUs上是微不足道的。我们利用这个平行结构,利用完全易碎化的 CUDA 内核来实施整个系统,重点是尽量减少浪费的带宽和可编译操作。我们综合了几个数量级的加速,在几秒内对高质量的神经图形原始体进行了培训,并在以${1920\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\