Group lasso is a commonly used regularization method in statistical learning in which parameters are eliminated from the model according to predefined groups. However, when the groups overlap, optimizing the group lasso penalized objective can be time-consuming on large-scale problems because of the non-separability induced by the overlapping groups. This bottleneck has seriously limited the application of overlapping group lasso regularization in many modern problems, such as gene pathway selection and graphical model estimation. In this paper, we propose a separable penalty as an approximation of the overlapping group lasso penalty. Thanks to the separability, the computation of regularization based on our penalty is substantially faster than that of the overlapping group lasso, especially for large-scale and high-dimensional problems. We show that the penalty is the tightest separable relaxation of the overlapping group lasso norm within the family of $\ell_{q_1}/\ell_{q_2}$ norms. Moreover, we show that the estimator based on the proposed separable penalty is statistically equivalent to the one based on the overlapping group lasso penalty with respect to their error bounds and the rate-optimal performance under the squared loss. We demonstrate the faster computational time and statistical equivalence of our method compared with the overlapping group lasso in simulation examples and a classification problem of cancer tumors based on gene expression and multiple gene pathways.
翻译:在统计学习中,群体 lasso是一种常用的正规化方法,根据预定的组别,从模型中消除参数。然而,如果这些组群相互重叠,优化组群受处罚的目标可能会由于重叠组群的不分离而耗费大量时间,因为重叠组群无法分离。这种瓶颈严重限制了在许多现代问题,如基因路径选择和图形模型估计等,对重叠组群的正规化的应用。在本文件中,我们提议以一个分解处罚作为重叠组群的近似值。由于分解性,基于我们处罚的正规化计算大大快于重叠组群,特别是对于大型和高维度问题而言,这种处罚可能耗费大量时间解决大规模问题。我们表明,这种惩罚是重迭组群群群群中在基因路径选择选择和图形模型估计方面最紧密的松散性放松。此外,我们显示,基于拟议分解处罚的估算值与基于重合组群体刑罚的统计处罚相比,在时间轴上大大加快了,在计算过程中,在计算模型的折叠式模型中,我们用折叠式方法展示了比重的机率计算方法,并展示了我们机率等值的计算方法。