One of the most efficient methods to solve L2-regularized primal problems, such as logistic regression and linear support vector machine (SVM) classification, is the widely used trust region Newton algorithm, TRON. While TRON has recently been shown to enjoy substantial speedups on shared-memory multi-core systems, exploiting graphical processing units (GPUs) to speed up the method is significantly more difficult, owing to the highly complex and heavily sequential nature of the algorithm. In this work, we show that using judicious GPU-optimization principles, TRON training time for different losses and feature representations may be drastically reduced. For sparse feature sets, we show that using GPUs to train logistic regression classifiers in LIBLINEAR is up to an order-of-magnitude faster than solely using multithreading. For dense feature sets--which impose far more stringent memory constraints--we show that GPUs substantially reduce the lengthy SVM learning times required for state-of-the-art proteomics analysis, leading to dramatic improvements over recently proposed speedups. Furthermore, we show how GPU speedups may be mixed with multithreading to enable such speedups when the dataset is too large for GPU memory requirements; on a massive dense proteomics dataset of nearly a quarter-billion data instances, these mixed-architecture speedups reduce SVM analysis time from over half a week to less than a single day while using limited GPU memory.
翻译:解决L2-正规化原始问题的最有效方法之一,如物流回归和线性支持矢量机(SVM)分类,是广泛使用的牛顿地区信任算法(TRON ) 。虽然TRON最近被显示在共享模拟多核心系统中享有大量超速,但利用图形处理器(GPU)加快方法的难度要大得多,因为算法非常复杂且顺序繁琐。在这项工作中,我们显示,使用明智的GPU-优化原则、针对不同损失和特征显示的TRON培训时间可能会大大缩短。对于稀有功能组,我们显示,使用GPU来培训LIBLINEAR中物流回归解析分析员的速度比仅仅使用多读系统要快得多。对于密度的特性组(GPU-PU)则要大大缩短了SVM的学习时间,从而导致最近提出的超时速度的大幅改进。此外,我们显示,在使用GPUM加速度分析时,对于高密度的存储速度可能比高一季度的数据要求会比高得多。