As GPU availability has increased and programming support has matured, a wider variety of applications are being ported to these platforms. Many parallel applications contain fine-grained synchronization idioms; as such, their correct execution depends on a degree of relative forward progress between threads (or thread groups). Unfortunately, many GPU programming specifications say almost nothing about relative forward progress guarantees between workgroups. Although prior work has proposed a spectrum of plausible progress models for GPUs, cross-vendor specifications have yet to commit to any model. This work is a collection of tools experimental data to aid specification designers when considering forward progress guarantees in programming frameworks. As a foundation, we formalize a small parallel programming language that captures the essence of fine-grained synchronization. We then provide a means of formally specifying a progress model, and develop a termination oracle that decides whether a given program is guaranteed to eventually terminate with respect to a given progress model. Next, we formalize a constraint for concurrent programs that require relative forward progress to terminate. Using this constraint, we synthesize a large set of 483 progress litmus tests. Combined with the termination oracle, this allows us to determine the expected status of each litmus test -- i.e. whether it is guaranteed eventual termination -- under various progress models. We present a large experimental campaign running the litmus tests across 8 GPUs from 5 different vendors. Our results highlight that GPUs have significantly different termination behaviors under our test suite. Most notably, we find that Apple and ARM GPUs do not support the linear occupancy-bound model, an intuitive progress model defined by prior work and hypothesized to describe the workgroup schedulers of existing GPUs.
翻译:随着 GPU 的可用性增加,编程支持也逐渐成熟,许多应用程序正在向这些平台输入更广泛的应用。许多平行应用程序包含精细的同步特性;因此,它们的正确执行取决于线(或线组)之间的相对前进进度。 不幸的是,许多 GPU 编程规格几乎没有提到工作组之间的相对前进保障。 虽然先前的工作为GPU提出了一系列可信的进展模型, 但交叉供应商的规格尚未对任何模式作出承诺。 这项工作是收集工具实验数据, 帮助规格设计者在考虑编程框架中的进度保障时使用。 作为基础, 我们正式确定一种小型的平行编程语言, 捕捉到精细的同步( 或线组) 的精髓。 然后我们提供一种手段, 正式指定一个进度模型, 并开发一个终止某个特定程序, 与某个特定的进度模型有关。 我们正式确定了一个平行的模型, 需要相对前进进度的制约。 利用这一制约, 我们综合了一套大型的进展标准。 结合了结束或结束, 我们把一个小的线级编程语言正式化语言语言 描述 G 整个 G 的G 开始一个测试 。