GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group of threads-wavefront or warp-execute instructions in lockstep. When threads in a group encounter a branching instruction, not all threads in the group take the same path, a phenomenon known as control-flow divergence. The control-flow divergence causes performance degradation because both paths of the branch must be executed one after the other. Prior research has primarily addressed this issue through architectural modifications. We observe that certain GPGPU kernels with control-flow divergence have similar control-flow structures with similar instructions on both sides of a branch. This structure can be exploited to reduce control-flow divergence by melding the two sides of the branch allowing threads to reconverge early, reducing divergence. In this work, we present DARM, a compiler analysis and transformation framework that can meld divergent control-flow structures with similar instruction sequences. We show that DARM can reduce the performance degradation from control-flow divergence.
翻译:GPGPPP 使用单一指示- 多元- 轨迹( SIMT) 执行模式, 即一组线条- 波浪前或曲速- Excute 指令在锁定处使用。 当一个组的线条遇到分支指令时, 不是全部线条都走同一路径, 这种现象被称为控制流差异 。 控制流差异导致性能退化, 因为分支的两条路径必须执行一个又一个。 先前的研究主要通过建筑修改来解决这个问题。 我们观察到, 某些控制流差异的GPGPUPU内核有类似的控制流结构, 分支两侧都有类似的指令 。 这个结构可以被利用来减少控制流差异, 其方法是将分支两侧的线线条混合, 以早期重新配置, 减少差异 。 在这项工作中, 我们提出 DARM, 一个编译器分析和转换框架, 能够将不同的控制流结构与类似的指令序列混合。 我们表明 DARM 可以减少控制流差异的性能降解 。