Backdoor inversion, the process of finding a backdoor trigger inserted into a machine learning model, has become the pillar of many backdoor detection and defense methods. Previous works on backdoor inversion often recover the backdoor through an optimization process to flip a support set of clean images into the target class. However, it is rarely studied and understood how large this support set should be to recover a successful backdoor. In this work, we show that one can reliably recover the backdoor trigger with as few as a single image. Specifically, we propose the SmoothInv method, which first constructs a robust smoothed version of the backdoored classifier and then performs guided image synthesis towards the target class to reveal the backdoor pattern. SmoothInv requires neither an explicit modeling of the backdoor via a mask variable, nor any complex regularization schemes, which has become the standard practice in backdoor inversion methods. We perform both quantitaive and qualitative study on backdoored classifiers from previous published backdoor attacks. We demonstrate that compared to existing methods, SmoothInv is able to recover successful backdoors from single images, while maintaining high fidelity to the original backdoor. We also show how we identify the target backdoored class from the backdoored classifier. Last, we propose and analyze two countermeasures to our approach and show that SmoothInv remains robust in the face of an adaptive attacker. Our code is available at https://github.com/locuslab/smoothinv .
翻译:后门反向, 找到进入机器学习模式的后门触发器的过程, 已经成为许多后门探测和防御方法的支柱。 后门反向先前的作品通常通过优化程序将后门恢复后门, 将一组干净的图像转换成目标类。 但是, 很少研究并理解这种支持组的大小, 以成功回收后门。 在这项工作中, 我们显示, 可以可靠地恢复后门触发器, 仅使用一个单一的图像。 具体地说, 我们提议了“ 平滑 Inv ” 方法, 这种方法首先构建了一个强有力的后门分类器的平滑版本, 然后对目标类进行有指导的图像合成, 向目标类展示后门模式。 平滑 InpleInv 不需要通过一个掩码变量对后门进行明确的建模, 也不需要任何复杂的正规化计划, 这已经成为后门反向方法的标准做法。 我们从先前公布的后门攻击中对后门解析器进行简单和定性的研究。 我们展示了从单一图像中成功修复后门门的后门, 同时保持高度忠诚/ 向后门分析。 我们还展示了我们展示了我们的后门 。 我们展示了我们是如何在最后的后端分析, 我们展示了我们的后门的后向后端分析。 我们展示了我们是如何展示了我们的后门的。</s>