通过非线性原始-双重混合梯度算法,有效、稳健、高维维度、稀有后勤回归 (Efficient and robust high-dimensional sparse logistic regression via nonlinear primal-dual hybrid gradient algorithms)

Logistic regression is a widely used statistical model to describe the relationship between a binary response variable and predictor variables in data sets. It is often used in machine learning to identify important predictor variables. This task, variable selection, typically amounts to fitting a logistic regression model regularized by a convex combination of $\ell_1$ and $\ell_{2}^{2}$ penalties. Since modern big data sets can contain hundreds of thousands to billions of predictor variables, variable selection methods depend on efficient and robust optimization algorithms to perform well. State-of-the-art algorithms for variable selection, however, were not traditionally designed to handle big data sets; they either scale poorly in size or are prone to produce unreliable numerical results. It therefore remains challenging to perform variable selection on big data sets without access to adequate and costly computational resources. In this paper, we propose a nonlinear primal-dual algorithm that addresses these shortcomings. Specifically, we propose an iterative algorithm that provably computes a solution to a logistic regression problem regularized by an elastic net penalty in $O(T(m,n)\log(1/\epsilon))$ operations, where $\epsilon \in (0,1)$ denotes the tolerance and $T(m,n)$ denotes the number of arithmetic operations required to perform matrix-vector multiplication on a data set with $m$ samples each comprising $n$ features. This result improves on the known complexity bound of $O(\min(m^2n,mn^2)\log(1/\epsilon))$ for first-order optimization methods such as the classic primal-dual hybrid gradient or forward-backward splitting methods.

翻译：物流回归是一个广泛使用的统计模型,用来描述数据集中的二进制响应变量和预测变量之间的关系。它通常用于机器学习, 以识别重要的预测变量。任务、变量选择通常相当于将一个物流回归模型正规化, 由$\ ell_ 1美元和$\ell\ ⁇ 2 ⁇ 2}美元组合组成。由于现代大数据集可以包含数十亿至数十亿的预测变量, 变量选择方法取决于高效和稳健的优化算法来良好地运行。然而, 用于变量选择的州级( 最先进的) 复杂性算法, 传统上不是为了处理大数据集; 它们规模不小, 或者容易产生不可靠的数字结果。因此, 在大数据集上执行变量选择, 没有足够和昂贵的计算资源。在本文中, 我们提议一个非线性初等原始算法, 解决这些缺陷。具体地说, 我们提议一种迭代算法, 以可理解的方式对物流回归问题进行一种解决方案, 由美元( t, n\\\\ lig) liveral relishal deal deal exal exal ex exal exal) exal exal ex ex exal ex ex exer ex ex ex ex ex ex ex ex ex ex ex ex ex ex extique ex ex ex ex ex exputus expal ex $. a ex ex $. ex ex $, ex ex ex ex ex ex ex ex ex ex ex $.