Several companies often safeguard their trained deep models (i.e. details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. We make the first attempt to provide adversarial robustness to the black box models in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose `wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our `wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a `regenerator' network with an objective to retrieve the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense
翻译:一些公司往往保护第三方用户经过训练的深层模型(如建筑细节、学习重量、培训细节等),仅以黑盒形式通过API显示第三方用户的深层模型(如建筑细节、学习重量、培训细节等)。此外,由于专有原因或敏感考虑,它们甚至可能无法提供培训数据。我们首次尝试通过无数据设置为黑盒模型提供对抗性强力。我们通过模型盗窃技术,通过变压模型和训练替代网络网络,构建合成数据。为了尽量减少在周遭样本上的对抗性污染,我们提议“波盘噪除器(WNR)”在输入图像上进行离散波盘分解,并仔细选择我们“电波系数选择模块”(WCSMSM)所决定的少数重要系数。为了在无数据存储器去除噪音后恢复图像的高频度内容,我们进一步培训了“再生器”网络,目的是恢复系数,使重建后的图像产生类似于在Surrogget模型上的原始预测值。在测试时,WNRR与经过训练的 RBER网络中经过再变动的变动的变动变动的网络网络,在10号基准网络中使用了我们的标准精确度网络中,在10号的模型中改进了对准的模型中,使得CRFIR-RFI-ral-real-real-real-real-real-real-real-real-real-real-real-real-real-real-real-real-real-real-real-real-real-real-com-com-lad-lad-com-re-re-re-lad-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-re-la-re-re-la-la-la-la-la-lad-lad-la-la-lad-la-la-re-re-re-re-re-re-re-re-re-re-re-re-re-re-la-la-la-la-la-la-la-la-la-la-la-la-la-la-la-la-la-la