The two primary approaches for high-dimensional regression problems are sparse methods (e.g. best subset selection which uses the L0-norm in the penalty) and ensemble methods (e.g. random forests). Although sparse methods typically yield interpretable models, they are often outperformed in terms of prediction accuracy by "blackbox" multi-model ensemble methods. We propose an algorithm to optimize an ensemble of L0-penalized regression models by extending recent developments in L0-optimization for sparse methods to multi-model regression ensembles. The sparse and diverse models in the ensemble are learned simultaneously from the data. Each of these models provides an explanation for the relationship between a subset of predictors and the response variable. We show how the ensembles achieve excellent prediction accuracy by exploiting the accuracy-diversity tradeoff of ensembles and investigate the effect of the number of models. In prediction tasks the ensembles can outperform state-of-the-art competitors on both simulated and real data. Forward stepwise regression is also generalized to multi-model regression ensembles and used to obtain an initial solution for our algorithm. The optimization algorithms are implemented in publicly available software packages.
翻译:暂无翻译