Heterogeneous ensembles built from the predictions of a wide variety and large number of diverse base predictors represent a potent approach to building predictive models for problems where the ideal base/individual predictor may not be obvious. Ensemble selection is an especially promising approach here, not only for improving prediction performance, but also because of its ability to select a collectively predictive subset, often a relatively small one, of the base predictors. In this paper, we present a set of algorithms that explicitly incorporate ensemble diversity, a known factor influencing predictive performance of ensembles, into a reinforcement learning framework for ensemble selection. We rigorously tested these approaches on several challenging problems and associated data sets, yielding that several of them produced more accurate ensembles than those that don't explicitly consider diversity. More importantly, these diversity-incorporating ensembles were much smaller in size, i.e., more parsimonious, than the latter types of ensembles. This can eventually aid the interpretation or reverse engineering of predictive models assimilated into the resultant ensemble(s).
翻译:从广泛和众多不同基础预测器的预测中得出的不同组合,是针对理想基础/个人预测器可能并不明显的问题建立预测模型的有力方法。在此,综合选择是一种特别有希望的方法,不仅对改进预测性能而言,而且对由于它能够选择一个集体预测子集,通常是相对较小的基础预测器,因此也是由于它能够从基础预测器中选择一个集体预测子集。在本文件中,我们提出一套算法,明确将共同多样性这一影响集合的预测性表现的已知因素纳入一个强化学习框架,供共同选择使用。我们严格测试了这些方法,以若干具有挑战性的问题和相关数据集为基础,其结果是其中一些方法产生的集合比那些没有明确考虑多样性的集合更为准确。更重要的是,这些多样性集成的集合比后几类混合体的规模要小得多,即更模糊性强。这最终可以帮助解释或反向工程将预测模型纳入结果组合。