Pre-training serves as a broadly adopted starting point for transfer learning on various downstream tasks. Recent investigations of lottery tickets hypothesis (LTH) demonstrate such enormous pre-trained models can be replaced by extremely sparse subnetworks (a.k.a. matching subnetworks) without sacrificing transferability. However, practical security-crucial applications usually pose more challenging requirements beyond standard transfer, which also demand these subnetworks to overcome adversarial vulnerability. In this paper, we formulate a more rigorous concept, Double-Win Lottery Tickets, in which a located subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks, to reach BOTH the same standard and robust generalization, under BOTH standard and adversarial training regimes, as the full pre-trained model can do. We comprehensively examine various pre-training mechanisms and find that robust pre-training tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts. For example, on downstream CIFAR-10/100 datasets, we identify double-win matching subnetworks with the standard, fast adversarial, and adversarial pre-training from ImageNet, at 89.26%/73.79%, 89.26%/79.03%, and 91.41%/83.22% sparsity, respectively. Furthermore, we observe the obtained double-win lottery tickets can be more data-efficient to transfer, under practical data-limited (e.g., 1% and 10%) downstream schemes. Our results show that the benefits from robust pre-training are amplified by the lottery ticket scheme, as well as the data-limited transfer setting. Codes are available at https://github.com/VITA-Group/Double-Win-LTH.
翻译:培训前的应用程序通常会提出超出标准传输的更具挑战性的要求,这也要求这些子网络克服对抗性脆弱性。在本文中,我们制定了更严格的概念,即双 Win Lotter Tickets,在这个概念中,一个来自预先培训模式的子网络可以在不同的下游任务上独立传输,以达到BOTH的标准和稳健的通用模式,正如BOTH标准与对抗性培训机制一样。然而,各种实际的安全问题应用程序通常会提出比标准传输更具有挑战性的要求,这也要求这些子网络克服对抗性脆弱性。在下游的 CFAR-10-100数据集中,我们发现一个来自预先培训模式的次级网络的双赢分网络可以独立传输给不同的下游任务,在BOTHTH标准和对抗性对抗性培训机制下,可以取代相同的标准和稳健健健的通用模式。我们全面审查前的各种培训前机制,并发现强的预置的双赢彩票计划往往比标准高。例如下游的CFAR-10-100数据集,我们发现一个与标准、快速对抗性亚值的亚基的分网络,从图像网络,从图像网络, 22%数据转换、我们的数据-%的转移、我们的数据-23值,可以分别显示数据-276的版本。