Computation of bounding boxes is a fundamental problem in high performance rendering, as it is an input to visibility culling and binning operations. In a scene description structured as a tree, clip nodes and blend nodes entail intersection and union of bounding boxes, respectively. These are straightforward to compute on the CPU using a sequential algorithm, but an efficient, parallel GPU algorithm is more elusive. This paper presents a fast and practical solution, with a new algorithm for the classic parentheses matching problem at its core. The core algorithm is presented abstractly (in terms of a PRAM abstraction), then with a concrete mapping to the thread, workgroup, and dispatch levels of real GPU hardware. The algorithm is implemented portably using compute shaders, and performance results show a dramatic speedup over a sequential CPU version, and indeed a reasonable fraction of maximum theoretical throughput of the GPU hardware. The immediate motivating application is 2D rendering, but the algorithms generalize to other domains, and the core parentheses matching problem has other applications including parsing.
翻译:捆绑框的计算是高性能转换的一个根本问题, 因为它是为可见度的抓取和宾入操作输入的。 在以树形结构构建的场景描述中, 剪切节点和混合节点分别涉及交叉和捆绑框的结合。 这些是使用顺序算法在 CPU 上进行计算的直接方法, 但一个高效、 平行的 GPU 算法比较难以找到。 本文提出了一个快速和实用的解决方案, 给经典括号在核心上匹配问题。 核心算法是抽象的( 以 PRAM 抽象化的方式), 然后对真正的 GPU 硬件的线条、 工作组和调度级别进行混凝土绘图。 算法是使用可移植的阴影器可移植地执行的, 其性能结果显示对按顺序的CPU 版本的快速加速, 以及实际上对 GPU 硬件的最大理论吞吐量的合理部分。 即时动机应用是 2D, 但算法一般到其它区域, 核心括号匹配问题有其他应用程序。