Methodology and optimization algorithms for sparse regression are extended to multi-model regression ensembles. In particular, we adapt optimization algorithms for l0-penalized problems to learn ensembles of sparse and diverse models. To generate an initial solution for our algorithm, we generalize forward stepwise regression to multi-model regression ensembles. The sparse and diverse models are learned jointly from the data and constitute alternative explanations for the relationship between the predictors and the response variable. Beyond the advantage of interpretability, in prediction tasks the ensembles are shown to outperform state-of-the-art competitors on both simulated and gene expression data. We study the effect of the number of models and show how the ensembles achieve excellent prediction accuracy by exploiting the accuracy-diversity tradeoff of ensembles. The optimization algorithms are implemented in publicly available R/C++ software packages.
翻译:稀有回归的方法和优化算法扩展至多模型回归组合。特别是,我们调整了对l0-惩罚性问题的优化算法,以学习稀有和多样化模型的集合。为了为我们的算法产生初步解决方案,我们将前向回归推广到多模型回归组合。从数据中共同学习了稀有和多样化模型,并构成了预测者和响应变量之间关系的替代解释。在预测任务中,除了可解释性的好处外,在模拟和基因表达数据上,组合显示优于最先进的竞争者。我们研究了模型数量的影响,并展示了组合如何通过利用组合的精度和多样性权衡来实现极好的预测准确性。优化算法是在公开提供的 R/C++软件包中实施的。