Recent works on Lottery Ticket Hypothesis have shown that pre-trained language models (PLMs) contain smaller matching subnetworks(winning tickets) which are capable of reaching accuracy comparable to the original models. However, these tickets are proved to be notrobust to adversarial examples, and even worse than their PLM counterparts. To address this problem, we propose a novel method based on learning binary weight masks to identify robust tickets hidden in the original PLMs. Since the loss is not differentiable for the binary mask, we assign the hard concrete distribution to the masks and encourage their sparsity using a smoothing approximation of L0 regularization.Furthermore, we design an adversarial loss objective to guide the search for robust tickets and ensure that the tickets perform well bothin accuracy and robustness. Experimental results show the significant improvement of the proposed method over previous work on adversarial robustness evaluation.
翻译:有关Lottery Ticket假设的近期著作表明,经过培训的语文模型(PLMs)包含较小的匹配子网络(获奖票),能够达到与原始模型相近的精确度。然而,这些票被证明不是对抗性例子的坏处,甚至比他们的PLM样板还要差。为了解决这个问题,我们提出了一个基于学习双重重量面罩的新颖方法,以识别隐藏在原始的PLMs中的强力票。由于损失对二进面罩来说是不可区分的,我们把硬混凝土配给面具,用平滑的L0整齐准度的近似值鼓励它们散居。此外,我们设计了对抗性损失目标,以指导对稳健的票的搜索,并确保票既准确性又稳健健。实验结果显示,拟议的方法比以前关于对抗性坚固度评价的工作有了重大改进。