Traditional normalization techniques (e.g., Batch Normalization and Instance Normalization) generally and simplistically assume that training and test data follow the same distribution. As distribution shifts are inevitable in real-world applications, well-trained models with previous normalization methods can perform badly in new environments. Can we develop new normalization methods to improve generalization robustness under distribution shifts? In this paper, we answer the question by proposing CrossNorm and SelfNorm. CrossNorm exchanges channel-wise mean and variance between feature maps to enlarge training distribution, while SelfNorm uses attention to recalibrate the statistics to bridge gaps between training and test distributions. CrossNorm and SelfNorm can complement each other, though exploring different directions in statistics usage. Extensive experiments on different fields (vision and language), tasks (classification and segmentation), settings (supervised and semi-supervised), and distribution shift types (synthetic and natural) show the effectiveness. Code is available at https://github.com/amazon-research/crossnorm-selfnorm
翻译:传统正常化技术(例如,批量正常化和情况正常化)一般和简单化地假定培训和测试数据按照同样的分布分布。随着分配变化在现实世界应用中是不可避免的,经过良好训练的以往正常化方法模式在新的环境中可能效果不佳。我们能否制定新的正常化方法,在分配变化中提高普遍化的稳健性?在本文件中,我们通过提出CrossNorm 和SelfNorm 交流渠道和SelfNorm来回答这个问题,扩大培训分布,而SelfNorm则利用注意力重新校正统计数据,以弥补培训和测试分布之间的差距。CrossNorm和SelfNorm可以互为补充,尽管在统计使用方面探索不同的方向。在不同领域(视觉和语言)、任务(分类和分解)、环境(监督和半监督)和分布转移类型(合成和自然)进行广泛的实验,显示效果。代码见https://github.com/amaz-researchation/crosynorn-selfnorum。