This paper considers the problem of supervised learning with linear methods when both features and labels can be corrupted, either in the form of heavy tailed data and/or corrupted rows. We introduce a combination of coordinate gradient descent as a learning algorithm together with robust estimators of the partial derivatives. This leads to robust statistical learning methods that have a numerical complexity nearly identical to non-robust ones based on empirical risk minimization. The main idea is simple: while robust learning with gradient descent requires the computational cost of robustly estimating the whole gradient to update all parameters, a parameter can be updated immediately using a robust estimator of a single partial derivative in coordinate gradient descent. We prove upper bounds on the generalization error of the algorithms derived from this idea, that control both the optimization and statistical errors with and without a strong convexity assumption of the risk. Finally, we propose an efficient implementation of this approach in a new python library called linlearn, and demonstrate through extensive numerical experiments that our approach introduces a new interesting compromise between robustness, statistical performance and numerical efficiency for this problem.
翻译:本文考虑了在特征和标签都可能被腐蚀的情况下,以线性方法监督学习的问题, 无论是以重尾数据和/ 或以腐蚀的行为形式。 我们引入了协调梯度下降作为学习算法的组合, 以及部分衍生物的稳健估计器。 这导致严格的统计学习方法, 其数字复杂性几乎与基于经验风险最小化的非野蛮方法相同。 主要的想法很简单: 稳健的梯度下降需要计算成本, 强力估计整个梯度以更新所有参数, 参数可以立即更新, 使用一个在协调梯度下降过程中的单一部分衍生物的稳健的估算器。 我们证明, 从这个概念中得出的算法在一般化错误上具有上限, 既控制优化, 也控制统计错误, 同时又没有对风险进行强烈的一致假设。 最后, 我们建议在名为 Linlearn 的新的python 图书馆中高效地实施这一方法, 并通过广泛的数字实验表明, 我们的方法在这一问题的稳健性、 统计性表现和数字效率之间提出了新的令人感兴趣的妥协。