Generalized Linear Models (GLM) form a wide class of regression and classification models, where prediction is a function of a linear combination of the input variables. For statistical inference in high dimension, sparsity inducing regularizations have proven to be useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as screening rules and working sets diminish the size of the optimization problem at hand, either by progressively removing variables, or by solving a growing sequence of smaller problems. For both techniques, significant variables are identified thanks to convex duality arguments. In this paper, we show that the dual iterates of a GLM exhibit a Vector AutoRegressive (VAR) behavior after sign identification, when the primal problem is solved with proximal gradient descent or cyclic coordinate descent. Exploiting this regularity, one can construct dual points that offer tighter certificates of optimality, enhancing the performance of screening rules and helping to design competitive working set algorithms.
翻译:通用线性模型(GLM)构成广泛的回归和分类模型类别,预测是投入变量线性组合的函数。对于高维的统计推论,在提供统计保证的同时,通过宽度诱发正规化证明是有用的。然而,解决由此产生的优化问题可能具有挑战性:即使对于流行的迭代算法,如协调下移,也需要绕过大量变量。为了减轻这一影响,被称为筛选规则和工作组的技术会减少当前优化问题的规模,要么逐步删除变量,要么解决越来越多的小问题序列。对于这两种技术,都通过混和双重性参数确定了重要的变量。在本文件中,我们表明,在签名识别后,当原始问题与准氧化梯度梯度下移或循环协调下移解决时,GLM的双轨法展示了矢量自动递增(VAR)行为。利用这一规律,可以建立双重点,提供更紧密的最佳性证书,加强筛选规则的绩效,并有助于设计具有竞争性的工作组合算法。