Sparse regression and classification estimators that respect group structures have application to an assortment of statistical and machine learning problems, from multitask learning to sparse additive modeling to hierarchical selection. This work introduces structured sparse estimators that combine group subset selection with shrinkage. To accommodate sophisticated structures, our estimators allow for arbitrary overlap between groups. We develop an optimization framework for fitting the nonconvex regularization surface and present finite-sample error bounds for estimation of the regression function. As an application requiring structure, we study sparse semiparametric modeling, a procedure that allows the effect of each predictor to be zero, linear, or nonlinear. For this task, the new estimators improve across several metrics on synthetic data compared to alternatives. Finally, we demonstrate their efficacy in modeling supermarket foot traffic and economic recessions using many predictors. These demonstrations suggest sparse semiparametric models, fit using the new estimators, are an excellent compromise between fully linear and fully nonparametric alternatives. All of our algorithms are made available in the scalable implementation grpsel.
翻译:尊重群体结构的偏差回归和分类估计符适用于从多任务学习到稀有添加型模型到等级选择等各种统计和机器学习问题,从多任务学习到稀有的添加型建模到等级选择。这项工作引入了结构分散的估算器,将分组子选择与缩水结合起来。为了容纳复杂的结构,我们的估算器允许各组之间任意重叠。我们开发了一个优化框架,以适应非 convex 正规化表面,并提出估算回归函数的有限抽样误差。作为一个需要结构的应用,我们研究稀有的半参数模型,这一程序允许每个预测器的效果为零、线性或非线性。对于这项任务,新的估算器在合成数据与替代品相比的多个尺度上有所改进。最后,我们展示了它们利用许多预测器模拟超市足流量和经济衰退的功效。这些演示表明,利用新的估算器,稀少的半参数模型是完全线性和完全非线性替代方法之间的极佳折中。我们的所有算法都是在可测量的边框中提供的。