We present a dynamically Growable GPU array (GGArray) fully implemented in GPU that does not require synchronization with the host. The idea is to improve the programming of GPU applications that require dynamic memory, by offering a structure that does not require pre-allocating GPU VRAM for the worst case scenario. The GGArray is based on the LFVector, by utilizing an array of them in order to take advantage of the GPU architecture and the synchronization offered by thread blocks. This structure is compared to other state of the art ones such as a pre-allocated static array and a semi-static array that needs to be resized through communication with the host. Experimental evaluation shows that the GGArray has a competitive insertion and resize performance, but it is slower for regular parallel memory accesses. Given the results, the GGArray is a potentially useful structure for applications with high uncertainty on the memory usage as well as applications that have phases, such as an insertion phase followed by a regular GPU phase. In such cases, the GGArray can be used for the first phase and then data can be flattened for the second phase in order to allow the classical GPU memory accesses which are faster. These results constitute a step towards achieving a parallel efficient C++ like vector for modern GPU architectures.
翻译:我们展示了在 GPU 中完全实施的动态成长 GPU 阵列( GGArray ), 不需要与主机同步。 想法是改进需要动态内存的 GPU 应用程序的编程, 提供一种结构, 不需要为最坏的情景预配 GPU VRAM 。 GGARray 以 LFVEctor 为基础, 利用其中的阵列来利用 GPU 架构和线索块提供的同步。 这个结构比对其它状态的艺术阵列, 如预先配置的静态阵列和半静态阵列, 需要通过与主机的通信重新缩放。 实验性评估显示 GGGARray 具有竞争性插入和调整性能, 但对于常规平行存取存取访问来说则比较慢。 鉴于结果, GGGGGarray 阵列是一个潜在的有用结构, 用于使用记忆以及具有高度不确定性的应用程序, 例如插入阶段, 由常规 GPU 阶段跟踪。 在这种情况下, GGGGarray 可以将数据用于第一阶段, 快速进入 GVDV 阶段。