Fusing regression coefficients into homogenous groups can unveil those coefficients that share a common value within each group. Such groupwise homogeneity reduces the intrinsic dimension of the parameter space and unleashes sharper statistical accuracy. We propose and investigate a new combinatorial grouping approach called $L_0$-Fusion that is amenable to mixed integer optimization (MIO). On the statistical aspect, we identify a fundamental quantity called grouping sensitivity that underpins the difficulty of recovering the true groups. We show that $L_0$-Fusion achieves grouping consistency under the weakest possible requirement of the grouping sensitivity: if this requirement is violated, then the minimax risk of group misspecification will fail to converge to zero. Moreover, we show that in the high-dimensional regime, one can apply $L_0$-Fusion coupled with a sure screening set of features without any essential loss of statistical efficiency, while reducing the computational cost substantially. On the algorithmic aspect, we provide a MIO formulation for $L_0$-Fusion along with a warm start strategy. Simulation and real data analysis demonstrate that $L_0$-Fusion exhibits superiority over its competitors in terms of grouping accuracy.
翻译:将回归系数转换成同质组,可以揭示每个组内具有共同价值的系数。这种组同性能可以减少参数空间的内在维度,并释放出更清晰的统计准确性。我们建议并调查一种称为$L_0$-Fusion的新的组合组合法,该组合法可以进行混合整数优化(MIO)。在统计方面,我们确定了一个称为组合敏感度的基本数量,这为恢复真实组的难度提供了基础。我们表明,在组合敏感度的最弱要求下,$_0$-Fusion实现了一致性:如果这一要求被违反,那么群体区分不当的最小最大风险将无法归为零。此外,我们表明,在高维系统中,可以应用$L_0$-Fus-Fusion,同时在不造成统计效率任何基本损失的情况下,对特征进行可靠的筛选,同时大幅度降低计算成本。在算法方面,我们提供了以美元/0美元-Fusion为基数的MIO配方,同时提出一个温暖的启动战略。模拟和真实的数据分析表明,其竞争优势高于5美元的集团。