In Domain Generalization (DG) settings, models trained on a given set of training domains have notoriously chaotic performance on distribution shifted test domains, and stochasticity in optimization (e.g. seed) plays a big role. This makes deep learning models unreliable in real world settings. We first show that a simple protocol for averaging model parameters along the optimization path, starting early during training, both significantly boosts domain generalization and diminishes the impact of stochasticity by improving the rank correlation between the in-domain validation accuracy and out-domain test accuracy, which is crucial for reliable model selection. Next, we show that an ensemble of independently trained models also has a chaotic behavior in the DG setting. Taking advantage of our observation, we show that instead of ensembling unaveraged models, ensembling moving average models (EoA) from different runs does increase stability and further boosts performance. On the DomainBed benchmark, when using a ResNet-50 pre-trained on ImageNet, this ensemble of averages achieves $88.6\%$ on PACS, $79.1\%$ on VLCS, $72.5\%$ on OfficeHome, $52.3\%$ on TerraIncognita, and $47.4\%$ on DomainNet, an average of $68.0\%$, beating ERM (w/o model averaging) by $\sim 4\%$. We also evaluate a model that is pre-trained on a larger dataset, where we show EoA achieves an average accuracy of $72.7\%$, beating its corresponding ERM baseline by $5\%$.
翻译:在内部通用(DG)设置中,在特定培训领域培训的模型在分布转换测试域上的表现臭名昭著地混乱,优化(如种子)的随机性也起了很大的作用。这使得深层次学习模式在现实世界环境中不可靠。我们首先显示,在培训初期,在优化路径上为平均模型设置一个简单的协议,在优化路径上,从培训初期开始,大大提高了域的常规化,并通过改进内部验证精度和外部测试精度之间的级别相关性,降低了随机性的影响,这对于可靠的模型选择至关重要。接下来,我们表明,独立培训的模型合集在DG设置中也有混乱行为。利用我们的观察,我们展示的是,利用不平均模型,从不同的运行中组合移动的平均模型,可以提高稳定性和进一步提升性能。在图像网络上使用ResNet-50美元前的测试精度,这种平均值的计算方法在PACS、79.1美元 美元 美元 美元 美元 美元 标准上显示,在VLCSEAR 5 平均数据上显示,在PERA AS $ AS AS 中,在 VLASUA ASUA AS 平均 AS AS 上, AS 。