Backdoor attacks have been shown to be a serious threat against deep learning systems such as biometric authentication and autonomous driving. An effective backdoor attack could enforce the model misbehave under certain predefined conditions, i.e., triggers, but behave normally otherwise. However, the triggers of existing attacks are directly injected in the pixel space, which tend to be detectable by existing defenses and visually identifiable at both training and inference stages. In this paper, we propose a new backdoor attack FTROJAN through trojaning the frequency domain. The key intuition is that triggering perturbations in the frequency domain correspond to small pixel-wise perturbations dispersed across the entire image, breaking the underlying assumptions of existing defenses and making the poisoning images visually indistinguishable from clean ones. We evaluate FTROJAN in several datasets and tasks showing that it achieves a high attack success rate without significantly degrading the prediction accuracy on benign inputs. Moreover, the poisoning images are nearly invisible and retain high perceptual quality. We also evaluate FTROJAN against state-of-the-art defenses as well as several adaptive defenses that are designed on the frequency domain. The results show that FTROJAN can robustly elude or significantly degenerate the performance of these defenses.
翻译:后门攻击被证明是对生物鉴别认证和自主驾驶等深层学习系统的严重威胁。 有效的后门攻击可以在某些预设条件下强制实施模型错误行为, 即触发器, 但通常行为正常。 但是, 直接将现有攻击的触发器注入像素空间, 这往往可以通过现有的防御系统探测, 在培训和推论阶段都可以视觉识别。 在本文中, 我们建议通过对频域施压, 实施一个新的后门攻击FTROJAN。 关键直觉是触发频率域的干扰, 与分散在整个图像中的小型像素误差相对应, 打破现有防御的基本假设, 使中毒图像与清洁的图像有视觉分辨。 我们在若干数据集和任务中评估FTROJAN, 显示它取得了很高的攻击成功率, 但不显著降低良性投入的预测准确性。 此外, 中毒图像几乎是隐形的, 并保持了高感知的质量。 我们还评估了FTROAN对全域的频率的干扰, 打破了现有防御的基本假设, 能够显著地显示这些精确的防御。