计算形状和维特克秩序的倾斜点云优化 (Rendering Point Clouds with Compute Shaders and Vertex Order Optimization)

While commodity GPUs provide a continuously growing range of features and sophisticated methods for accelerating compute jobs, many state-of-the-art solutions for point cloud rendering still rely on the provided point primitives (GL_POINTS, POINTLIST, ...) of graphics APIs for image synthesis. In this paper, we present several compute-based point cloud rendering approaches that outperform the hardware pipeline by up to an order of magnitude and achieve significantly better frame times than previous compute-based methods. Beyond basic closest-point rendering, we also introduce a fast, high-quality variant to reduce aliasing. We present and evaluate several variants of our proposed methods with different flavors of optimization, in order to ensure their applicability and achieve optimal performance on a range of platforms and architectures with varying support for novel GPU hardware features. During our experiments, the observed peak performance was reached rendering 796 million points (12.7GB) at rates of 62 to 64 frames per second (50 billion points per second, 802GB/s) on an RTX 3090 without the use of level-of-detail structures. We further introduce an optimized vertex order for point clouds to boost the efficiency of GL_POINTS by a factor of 5x in cases where hardware rendering is compulsory. We compare different orderings and show that Morton sorted buffers are faster for some viewpoints, while shuffled vertex buffers are faster in others. In contrast, combining both approaches by first sorting according to Morton-code and shuffling the resulting sequence in batches of 128 points leads to a vertex buffer layout with high rendering performance and low sensitivity to viewpoint changes.

翻译：虽然商品 GPU 提供了不断增长的功能范围以及加快计算工作的复杂方法, 但是对于点云,许多最先进的解决方案仍然依赖于图形 API 图像合成的图形 API 提供的点原始值( GL_ POINTS, POINTLIST,...) 。在本文中, 我们展示了几种基于计算基点的云, 使硬件管道达到一个数量级, 并达到大大优于先前基于计算方法的框架时间。除了基本最接近点外, 我们还引入了一个快速、高质量的变异, 以减少别名。我们展示并评价了我们所提议的方法的几种变异性, 其优化的口味是不同的点( GLINTIS ) 。在我们的实验中, 观察到的峰值达到7.96亿个点, 以每秒62- 64 框架( 每秒50亿个点, 802GB/s) 的比率。在不使用水平的缓冲结构的情况下, 我们展示并评价了我们的拟议方法的几种变异异性。我们进一步引入了一个新的平台水平水平水平水平水平水平水平水平水平水平向硬值显示水平水平水平水平水平水平水平水平将向向水平向向水平水平水平向水平向向向向水平向向向向水平向向向向向向水平向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向向