In Domain Generalization (DG) settings, models trained independently on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that this chaotic behavior exists even along the training optimization trajectory of a single model, and propose a simple model averaging protocol that both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable early stopping. Taking advantage of our observation, we show that instead of ensembling unaveraged models (that is typical in practice), ensembling moving average models (EoA) from independent runs further boosts performance. We theoretically explain the boost in performance of ensembling and model averaging by adapting the well known Bias-Variance trade-off to the domain generalization setting. On the DomainBed benchmark, when using a pre-trained ResNet-50, this ensemble of averages achieves an average of $68.0\%$, beating vanilla ERM (w/o averaging/ensembling) by $\sim 4\%$, and when using a pre-trained RegNetY-16GF, achieves an average of $76.6\%$, beating vanilla ERM by $6\%$. Our code is available at \url{https://github.com/salesforce/ensemble-of-averages}.
翻译:在 Domain Generalization (DG) 设置中,在特定的培训领域独立培训的模型在分布转换测试域上的表现臭名昭著地混乱,优化(例如种子)中的随机性能发挥很大的作用。这使深层次学习模式在现实世界环境中变得不可靠。我们首先表明,即使在单一模式的培训优化轨迹上,这种混乱行为仍然存在,并提出了一个简单的平均协议模式,既通过提高内部验证精确度和外部测试精确度之间的等级相关性,从而大大促进域化,又减少随机性的影响。这是可靠早期停止的关键。我们利用我们的观察,我们展示的是,而不是将非平均模式(实践中常见的)组合起来,而是将独立运行的平均模型(EoA)混杂起来,我们从理论上解释了组合和模型性能的提高,方法是将众所周知的Bias-Varciarial交易量调整到域化设定。 Domaine Bergle reality, 当使用事先训练的 Res-Net$50, vanniversalalalalal 4 reals 达到平均汇率时,我们平均的REBEBELA/R_____BY/RBY 。