This paper will describe and analyze a new phenomenon that was not known before, which we call "Early Transferability". Its essence is that the adversarial perturbations transfer among different networks even at extremely early stages in their training. In fact, one can initialize two networks with two different independent choices of random weights and measure the angle between their adversarial perturbations after each step of the training. What we discovered was that these two adversarial directions started to align with each other already after the first few training steps (which typically use only a small fraction of the available training data), even though the accuracy of the two networks hadn't started to improve from their initial bad values due to the early stage of the training. The purpose of this paper is to present this phenomenon experimentally and propose plausible explanations for some of its properties.
翻译:本文将描述和分析一个以前不为人知的新现象,我们称之为“可转移性 ”, 其本质是不同网络之间的对抗性扰动转移,即使在培训的极早期阶段也是如此。事实上,我们可以启动两个网络,对随机权重有两种不同的独立选择,并测量培训每一步后敌对性扰动之间的角角。我们发现,这两个对立方向在最初几个培训步骤(通常只使用现有培训数据的一小部分)之后就开始彼此一致了,尽管由于培训的早期阶段,这两个网络的准确性没有从最初的坏值开始得到改善。本文的目的是试验性地介绍这一现象,并提出对其某些特性的可信解释。