后勤倒退模型数据多样化组合 (Data-Driven Diverse Ensembles of Logistic Regression Models)

A novel framework for statistical learning is introduced which combines ideas from regularization and ensembling. This framework is applied to learn an ensemble of logistic regression models for high-dimensional binary classification. In the new framework the models in the ensemble are learned simultaneously by optimizing a multi-convex objective function. To enforce diversity between the models the objective function penalizes overlap between the models in the ensemble. Measures of diversity in classifier ensembles are used to show how our method learns the ensemble by exploiting the accuracy-diversity trade-off for ensemble models. In contrast to other ensembling approaches, the resulting ensemble model is fully interpretable as a logistic regression model, asymptotically consistent, and at the same time yields excellent prediction accuracy as demonstrated in an extensive simulation study and gene expression data applications. The models found by the proposed ensemble methodology can also reveal alternative mechanisms that can explain the relationship between the predictors and the response variable. An open-source compiled software library implementing the proposed method is briefly discussed.

翻译：采用新的统计学习框架,将正规化和组合的概念结合起来。这个框架用于学习一套高维二进制分类的后勤回归模型。在新的框架中,通过优化多孔目标功能,可以同时学习共同点中的模型。为了在模型之间执行多样性,目标功能会惩罚共同点中各种模型之间的重叠。使用分类器组合多样性的测量方法来显示我们的方法如何通过利用精确度-多样性交换组合模型来了解共同点。与其他组合法不同,所产生的共同点模型可以完全解释为物流回归模型,但并不存在任何矛盾,同时产生极好的预测准确性,如广泛模拟研究和基因表达数据应用所显示的那样。拟议的组合方法所发现的模型还可以揭示能够解释预测器与响应变量之间关系的替代机制。将实施拟议方法的公开源汇编软件库进行简要讨论。