Deep ensembles perform better than a single network thanks to the diversity among their members. Recent approaches regularize predictions to increase diversity; however, they also drastically decrease individual members' performances. In this paper, we argue that learning strategies for deep ensembles need to tackle the trade-off between ensemble diversity and individual accuracies. Motivated by arguments from information theory and leveraging recent advances in neural estimation of conditional mutual information, we introduce a novel training criterion called DICE: it increases diversity by reducing spurious correlations among features. The main idea is that features extracted from pairs of members should only share information useful for target class prediction without being conditionally redundant. Therefore, besides the classification loss with information bottleneck, we adversarially prevent features from being conditionally predictable from each other. We manage to reduce simultaneous errors while protecting class information. We obtain state-of-the-art accuracy results on CIFAR-10/100: for example, an ensemble of 5 networks trained with DICE matches an ensemble of 7 networks trained independently. We further analyze the consequences on calibration, uncertainty estimation, out-of-distribution detection and online co-distillation.
翻译:由于其成员的多样性,最近的办法使预测标准化,以增加多样性;不过,它们也大幅度降低个别成员的业绩。在本文中,我们认为,深层集合的学习战略需要解决共同多样性和个体理解之间的权衡问题。我们借助信息理论的论据和利用对有条件的相互信息神经估计的最新进展,引入了一个新的培训标准,称为DICE:通过减少各功能间虚假的关联来增加多样性。主要想法是,从一对成员中提取的特征只能分享对目标类预测有用的信息,而不附带条件的冗余性。因此,除了信息瓶颈分类损失之外,我们还以对抗方式防止彼此附带条件的可预见性。我们设法减少同时出现的错误,同时保护阶级信息。我们从CIFAR-10/100获得最新准确性的结果:例如,由DICE培训的5个网络的组合与独立培训的7个网络的组合相匹配。我们进一步分析了校准、不确定性估计、超模版的检测和在线联结方面的后果。