It is becoming increasingly common to utilize pre-trained models provided by third parties due to their convenience. At the same time, however, these models may be vulnerable to both poisoning and evasion attacks. We introduce an algorithmic framework that can mitigate potential security vulnerabilities in a pre-trained model when clean data from its training distribution is unavailable to the defender. The framework reverse-engineers samples from a given pre-trained model. The resulting synthetic samples can then be used as a substitute for clean data to perform various defenses. We consider two important attack scenarios -- backdoor attacks and evasion attacks -- to showcase the utility of synthesized samples. For both attacks, we show that when supplied with our synthetic data, the state-of-the-art defenses perform comparably or sometimes even better than the case when it's supplied with the same amount of clean data.
翻译:使用第三方因其方便性而提供的预先培训的模型越来越普遍。但与此同时,这些模型可能易受中毒和逃生袭击的影响。当辩护人无法获得培训分发的清洁数据时,我们引入了一种算法框架,在事先培训的模型中可以减轻潜在的安全脆弱性。从特定培训前模型中提取的反向工程者样本。由此产生的合成样品可以用来替代清洁数据,进行各种防御。我们认为两种重要的攻击情景 -- -- 后门攻击和逃生袭击 -- -- 来展示合成样品的效用。在这两种袭击中,我们表明,在提供我们的合成数据时,最先进的防御手段比提供同样数量的清洁数据时要好,有时甚至更好。