We present a new algorithmic framework for grouped variable selection that is based on discrete mathematical optimization. While there exist several appealing approaches based on convex relaxations and nonconvex heuristics, we focus on optimal solutions for the $\ell_0$-regularized formulation, a problem that is relatively unexplored due to computational challenges. Our methodology covers both high-dimensional linear regression and nonparametric sparse additive modeling with smooth components. Our algorithmic framework consists of approximate and exact algorithms. The approximate algorithms are based on coordinate descent and local search, with runtimes comparable to popular sparse learning algorithms. Our exact algorithm is based on a standalone branch-and-bound (BnB) framework, which can solve the associated mixed integer programming (MIP) problem to certified optimality. By exploiting the problem structure, our custom BnB algorithm can solve to optimality problem instances with $5 \times 10^6$ features in minutes to hours -- over $1000$ times larger than what is currently possible using state-of-the-art commercial MIP solvers. We also explore statistical properties of the $\ell_0$-based estimators. We demonstrate, theoretically and empirically, that our proposed estimators have an edge over popular group-sparse estimators in terms of statistical performance in various regimes.
翻译:我们为基于离散数学优化的分组变量选择提出了一个新的算法框架。 虽然存在一些基于康韦克斯放松和非康维克斯超光度的吸引性方法, 但我们的精确算法基于一个独立分支和约束(BnB)框架, 这个框架可以解决相关的混合整流编程(MIP)问题, 以验证最佳性能。 通过利用问题结构, 我们的定制的BnB算法可以解决最佳性问题, 以每小时5美元计时10美分6美元的特征, 比目前可能使用的先进商业MIP解算器大1 000倍以上。 我们还探索了以美元为基数的统计学等级, 并展示了以美元为基数的统计学等级, 并展示了以美元为基数的不同统计学等级的统计学术语。