Ensembles of machine learning models have been well established as a powerful method of improving performance over a single model. Traditionally, ensembling algorithms train their base learners independently or sequentially with the goal of optimizing their joint performance. In the case of deep ensembles of neural networks, we are provided with the opportunity to directly optimize the true objective: the joint performance of the ensemble as a whole. Surprisingly, however, directly minimizing the loss of the ensemble appears to rarely be applied in practice. Instead, most previous research trains individual models independently with ensembling performed post hoc. In this work, we show that this is for good reason - joint optimization of ensemble loss results in degenerate behavior. We approach this problem by decomposing the ensemble objective into the strength of the base learners and the diversity between them. We discover that joint optimization results in a phenomenon in which base learners collude to artificially inflate their apparent diversity. This pseudo-diversity fails to generalize beyond the training data, causing a larger generalization gap. We proceed to demonstrate the practical implications of this effect finding that, in some cases, a balance between independent training and joint optimization can improve performance over the former while avoiding the degeneracies of the latter.
翻译:机械学习模型的集合已被广泛确立,是改进单一模型业绩的有力方法。传统上,混合算法是独立或顺序地培训其基础学习者,目的是优化其共同表现。在神经网络的深层集合中,我们有机会直接优化真正的目标:整体整体组合的共同表现。但令人惊讶的是,直接最大限度地减少组合损失的现象似乎很少在实践中得到应用。相反,大多数先前的研究没有将单个模型独立地与完成的后期组合一起培训,造成更大的普遍差距。在这项工作中,我们证明这是有良好理由的,即联合优化串通损失的结果,导致堕落行为。我们处理这一问题的方法是,将共通目标分解成基础学习者的力量和他们之间的多样性。我们发现,联合优化的结果是一种现象,即基础学习者相互勾结,人为地缩小其明显的多样性。这种伪多样化未能在培训数据之外进行普及,造成更大的普遍差距。我们着手证明这一效果的实际影响是有道理的,即联合优化在有些案例中,在避免联合优化的同时,在后一种情况下,可以改进联合优化。