A novel method is proposed to learn an ensemble of logistic classification models in the context of high-dimensional binary classification. The models in the ensemble are built simultaneously by optimizing a multi-convex objective function. To enforce diversity between the models the objective function penalizes overlap between the models in the ensemble. We study the bias and variance of the individual models as well as their correlation and discuss how our method learns the ensemble by exploiting the accuracy-diversity trade-off for ensemble models. In contrast to other ensembling approaches, the resulting ensemble model is fully interpretable as a logistic regression model and at the same time yields excellent prediction accuracy as demonstrated in an extensive simulation study and gene expression data applications. An open-source compiled software library implementing the proposed method is briefly discussed.
翻译:在高维二进制分类中,提出了一套新颖的方法来学习后勤分类模型的组合。组合中的模型是同时通过优化多曲线目标功能而建立的。为了在模型之间加强多样性,目标功能会惩罚组合中的模型之间的重叠。我们研究了单个模型的偏差和差异及其相互关系,并讨论了我们的方法如何通过利用精确度-多样性权衡组合模型来学习组合。与其他组合方法不同,由此产生的组合模型可以完全解释为物流回归模型,同时产生在广泛的模拟研究和基因表达数据应用中所显示的极好的预测准确性。我们简要讨论了采用拟议方法的公开源汇编软件图书馆。