Ensemble methods combine the predictions of multiple models to improve performance, but they require significantly higher computation costs at inference time. To avoid these costs, multiple neural networks can be combined into one by averaging their weights (model soups). However, this usually performs significantly worse than ensembling. Weight averaging is only beneficial when weights are similar enough (in weight or feature space) to average well but different enough to benefit from combining them. Based on this idea, we propose PopulAtion Parameter Averaging (PAPA): a method that combines the generality of ensembling with the efficiency of weight averaging. PAPA leverages a population of diverse models (trained on different data orders, augmentations, and regularizations) while occasionally (not too often, not too rarely) replacing the weights of the networks with the population average of the weights. PAPA reduces the performance gap between averaging and ensembling, increasing the average accuracy of a population of models by up to 1.1% on CIFAR-10, 2.4% on CIFAR-100, and 1.9% on ImageNet when compared to training independent (non-averaged) models.
翻译:集成方法将多个模型的预测结果结合起来以提高性能。但是,在推断时,这通常需要更高的计算资源。为了避免这些成本,可以将多个神经网络的权重进行平均以得到一个更加简单的模型。但是,与集成方法相比,这种方法通常的性能更差。权重平均仅有在权重足够相似(在权重或特征空间中)以进行良好平均但足够不同以从中受益时才有益。基于这个思想,我们提出了基于群体参数平均的神经网络融合方法:(PAPA)。PAPA利用多样性模型群体(在不同的数据顺序、扩增方法和正则化方案上进行训练),同时周期性地替换网络权重为平均后的群体平均值。与训练独立的模型相比,PAPA可以将模型群体的平均准确度在CIFAR-10上提高1.1%、在CIFAR-100上提高2.4%、在ImageNet上提高1.9%。