Recent work has shown that convolutional neural network classifiers overly rely on texture at the expense of shape cues. We make a similar but different distinction between shape and local image cues, on the one hand, and global image statistics, on the other. Our method, called Permuted Adaptive Instance Normalization (pAdaIN), reduces the representation of global statistics in the hidden layers of image classifiers. pAdaIN samples a random permutation $\pi$ that rearranges the samples in a given batch. Adaptive Instance Normalization (AdaIN) is then applied between the activations of each (non-permuted) sample $i$ and the corresponding activations of the sample $\pi(i)$, thus swapping statistics between the samples of the batch. Since the global image statistics are distorted, this swapping procedure causes the network to rely on cues, such as shape or texture. By choosing the random permutation with probability $p$ and the identity permutation otherwise, one can control the effect's strength. With the correct choice of $p$, fixed apriori for all experiments and selected without considering test data, our method consistently outperforms baselines in multiple settings. In image classification, our method improves on both CIFAR100 and ImageNet using multiple architectures. In the setting of robustness, our method improves on both ImageNet-C and Cifar-100-C for multiple architectures. In the setting of domain adaptation and domain generalization, our method achieves state of the art results on the transfer learning task from GTAV to Cityscapes and on the PACS benchmark.
翻译:近期工作显示, convolution 神经网络分类过分依赖质地, 而忽略形状提示 。 我们对形状和本地图像提示以及全球图像统计进行相似但不同的区分。 我们的方法, 叫做 Permoded 适应事件正常化( pAdaIN), 减少了全球统计数据在图像分类者隐藏层中的代表性 。 pAdaIN 抽样 随机调整 $\ ppi$, 在给定批次中重新排列样本 。 适应性正常化( AdaIN ) 在每种( 非不固定的) 样本和本地图像提示的激活( $\ pi) 和样本 $\ pi) 和全球图像统计的相应激活之间做了相似但不同的区分 。 由于全球图像统计被扭曲, 这种互换程序导致网络依赖提示, 如形状或纹理等 。 通过选择随机调整 美元和身份调整, 就可以控制效果的强度 。 由于正确选择 $p$, 固定的域域标值 美元, 100- priorioriori 将我们所有的域域标值转换为多域的域框架,, 将我们所有的域的域基值转换为 。