Machine learning algorithms must be able to efficiently cope with massive data sets. Therefore, they have to scale well on any modern system and be able to exploit the computing power of accelerators independent of their vendor. In the field of supervised learning, Support Vector Machines (SVMs) are widely used. However, even modern and optimized implementations such as LIBSVM or ThunderSVM do not scale well for large non-trivial dense data sets on cutting-edge hardware: Most SVM implementations are based on Sequential Minimal Optimization, an optimized though inherent sequential algorithm. Hence, they are not well-suited for highly parallel GPUs. Furthermore, we are not aware of a performance portable implementation that supports CPUs and GPUs from different vendors. We have developed the PLSSVM library to solve both issues. First, we resort to the formulation of the SVM as a least squares problem. Training an SVM then boils down to solving a system of linear equations for which highly parallel algorithms are known. Second, we provide a hardware independent yet efficient implementation: PLSSVM uses different interchangeable backends--OpenMP, CUDA, OpenCL, SYCL--supporting modern hardware from various vendors like NVIDIA, AMD, or Intel on multiple GPUs. PLSSVM can be used as a drop-in replacement for LIBSVM. We observe a speedup on CPUs of up to 10 compared to LIBSVM and on GPUs of up to 14 compared to ThunderSVM. Our implementation scales on many-core CPUs with a parallel speedup of 74.7 on up to 256 CPU threads and on multiple GPUs with a parallel speedup of 3.71 on four GPUs. The code, utility scripts, and documentation are all available on GitHub: https://github.com/SC-SGS/PLSSVM.
翻译:机器学习算法必须能够有效地应对大量数据集。 因此, 它们必须在任何现代系统上打好比例, 并且能够利用独立于其供应商的加速器的计算能力。 在监管学习领域, 支持矢量机( SVM) 被广泛使用。 但是, 即使是现代和优化的操作方法, 如 LIMBSVM 或 ThunderSVM 等, 对于大型非三重密集的尖端硬件数据集来说, 也并非很好: 大多数 SVM 执行基于序列最小优化, 这是一种优化的内在顺序算法。 因此, 它们不适合于高度平行的加速器的计算能力。 此外, 我们不知道一个支持不同供应商的 CPU和 GPUPM 的操作性化操作。 我们开发了 PLSSM 图书馆, 将SVM 设计成一个最小的版本。 然后, 将SVSS 的高级端点到所有直线性平方程式系统, 将SLVMS 升级到一个非常平行的系统。 第二, 我们比一个硬的CRVM 快速化的运行系统, 将一个硬化的SUDISDMDMDM 用于不同的执行。