增强型循环坐标下降方法在弹性网络惩罚线性模型中的应用 (Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models)

We present a novel enhanced cyclic coordinate descent (ECCD) framework for solving generalized linear models with elastic net constraints that reduces training time in comparison to existing state-of-the-art methods. We redesign the CD method by performing a Taylor expansion around the current iterate to avoid nonlinear operations arising in the gradient computation. By introducing this approximation, we are able to unroll the vector recurrences occurring in the CD method and reformulate the resulting computations into more efficient batched computations. We show empirically that the recurrence can be unrolled by a tunable integer parameter, $s$, such that $s > 1$ yields performance improvements without affecting convergence, whereas $s = 1$ yields the original CD method. A key advantage of ECCD is that it avoids the convergence delay and numerical instability exhibited by block coordinate descent. Finally, we implement our proposed method in C++ using Eigen to accelerate linear algebra computations. Comparison of our method against existing state-of-the-art solvers shows consistent performance improvements of $3\times$ in average for regularization path variant on diverse benchmark datasets. Our implementation is available at https://github.com/Yixiao-Wang-Stats/ECCD.

翻译：本文提出了一种新颖的增强型循环坐标下降（ECCD）框架，用于求解具有弹性网络约束的广义线性模型，与现有最先进方法相比显著缩短了训练时间。我们通过在当前迭代点进行泰勒展开重新设计了CD方法，以避免梯度计算中出现的非线性运算。通过引入这种近似，我们能够展开CD方法中出现的向量递推关系，并将所得计算重构为更高效的批处理计算。我们通过实验证明，该递推关系可通过可调整数参数$s$进行展开，其中$s > 1$可在不影响收敛性的前提下提升性能，而$s = 1$则对应原始CD方法。ECCD的一个关键优势在于避免了块坐标下降法存在的收敛延迟和数值不稳定问题。最后，我们使用C++结合Eigen库实现了所提出的方法以加速线性代数运算。在多样化基准数据集上，我们的方法与现有最先进求解器相比，在正则化路径变体上平均实现了$3\times$的性能提升。代码实现可在https://github.com/Yixiao-Wang-Stats/ECCD获取。

相关内容

坐标下降

关注 0

坐标下降法（coordinate descent）是一种非梯度优化算法。算法在每次迭代中，在当前点处沿一个坐标方向进行一维搜索以求得一个函数的局部极小值。在整个过程中循环使用不同的坐标方向。对于不可拆分的函数而言，算法可能无法在较小的迭代步数中求得最优解。为了加速收敛，可以采用一个适当的坐标系，例如通过主成分分析获得一个坐标间尽可能不相互关联的新坐标系.

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日