Hybrid ensemble, an essential branch of ensembles, has flourished in the regression field, with studies confirming diversity's importance. However, previous ensembles consider diversity in the sub-model training stage, with limited improvement compared to single models. In contrast, this study automatically selects and weights sub-models from a heterogeneous model pool. It solves an optimization problem using an interior-point filtering linear-search algorithm. The objective function innovatively incorporates negative correlation learning as a penalty term, with which a diverse model subset can be selected. The best sub-models from each model class are selected to build the NCL ensemble, which performance is better than the simple average and other state-of-the-art weighting methods. It is also possible to improve the NCL ensemble with a regularization term in the objective function. In practice, it is difficult to conclude the optimal sub-model for a dataset prior due to the model uncertainty. Regardless, our method would achieve comparable accuracy as the potential optimal sub-models. In conclusion, the value of this study lies in its ease of use and effectiveness, allowing the hybrid ensemble to embrace diversity and accuracy.
翻译:混合集成作为集成学习的一个重要分支,在回归领域得到了广泛的应用,学术研究已经证实差异性的重要性。然而,以前的集成方法通常在子模型的训练阶段考虑差异性,相对于单一模型的预测性能的提升非常有限。相比之下,本研究从异构模型池中自动选择和加权子模型。通过使用内点过滤线性搜索算法,解决一个优化问题。目标函数创新地将“负相关性学习”作为惩罚项,通过该项可以选择具有差异性的模型子集。该方法选择每个模型类中的最佳子模型来构建NCL集成模型,其性能优于简单平均和其他最先进的加权方法。在目标函数中加入一个正则化项可以进一步提高NCL集成的性能。在实践中,由于模型不确定性,很难确定数据集的最佳子模型。尽管如此,本研究的方法可以实现与潜在最优子模型相当的准确度。总之,本研究的价值在于其易用性和有效性,允许混合集成更好地平衡数据集的差异性和预测性能。