结合 CPU 协作实施简单化算法 (A Hybrid Multi-GPU Implementation of Simplex Algorithm with CPU Collaboration)

The simplex algorithm has been successfully used for many years in solving linear programming (LP) problems. Due to the intensive computations required (especially for the solution of large LP problems), parallel approaches have also extensively been studied. The computational power provided by the modern GPUs as well as the rapid development of multicore CPU systems have led OpenMP and CUDA programming models to the top preferences during the last years. However, the desired efficient collaboration between CPU and GPU through the combined use of the above programming models is still considered a hard research problem. In the above context, we demonstrate here an excessively efficient implementation of standard simplex, targeting to the best possible exploitation of the concurrent use of all the computing resources, on a multicore platform with multiple CUDA-enabled GPUs. More concretely, we present a novel hybrid collaboration scheme which is based on the concurrent execution of suitably spread CPU-assigned (via multithreading) and GPU-offloaded computations. The experimental results extracted through the cooperative use of OpenMP and CUDA over a notably powerful modern hybrid platform (consisting of 32 cores and two high-spec GPUs, Titan Rtx and Rtx 2080Ti) highlight that the performance of the presented here hybrid GPU/CPU collaboration scheme is clearly superior to the GPU-only implementation under almost all conditions. The corresponding measurements validate the value of using all resources concurrently, even in the case of a multi-GPU configuration platform. Furthermore, the given implementations are completely comparable (and slightly superior in most cases) to other related attempts in the bibliography, and clearly superior to the native CPU-implementation with 32 cores.

翻译：多年来,在解决线性编程(LP)问题方面成功地使用了简单算法。由于需要大量计算(特别是解决大型LP问题),还广泛研究了平行方法。现代GPU提供的计算能力以及多核心CPU系统的快速开发,使OpenMP和CUDA编程模式在过去几年中达到了最优先的高度。然而,通过合并使用上述编程模型,CPU和GPU之间所期望的高效合作仍被视为一个棘手的研究问题。在以上背景下,我们在这里展示了标准简单x的过度高效实施,目标是尽可能最佳利用所有计算资源同时使用,在多CUDA驱动的多核心平台上,同时使用多核心CPM和GUPU加载计算方法。在GUPU(多读数)和GPU加载计算方法上同时执行,通过OpenMP和CUDA的合作使用一个特别强大的现代混合平台(甚至CUPU的CFI),在GPI-C-CFI 和GPI-C-C-S-S-S-SOL-S-S-Seral-C-Supral-C-C-C-C-C-S-C-C-S-S-Slentral-Llental-S-I-S-S-Sl)中,所有GPUD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-Slental-S-Serval-S-Slation-Seral-S-L-Slation-S-S-S-L-L-S-S-L-S-S-S-S-S-S-S-S-S-S-S-L-S-S-S-S-S-S-S-S-S-S-L-L-L-S-S-S-S-S-S-S-S-S-S-L-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S