Backdoor attacks aim to mislead machine-learning models to output an attacker-specified class when presented a specific trigger at test time. These attacks require poisoning the training data or compromising the learning algorithm, e.g., by injecting poisoning samples containing the trigger into the training set, along with the desired class label. Despite the increasing number of studies on backdoor attacks and defenses, the underlying factors affecting the success of backdoor attacks, along with their impact on the learning algorithm, are not yet well understood. In this work, we aim to shed light on this issue. In particular, we unveil that backdoor attacks work by inducing a smoother decision function around the triggered samples -- a phenomenon which we refer to as \textit{backdoor smoothing}. We quantify backdoor smoothing by defining a measure that evaluates the uncertainty associated to the predictions of a classifier around the input samples. Our experiments show that smoothness increases when the trigger is added to the input samples, and that the phenomenon is more pronounced for more successful attacks. However, our experiments also show that patterns fulfilling backdoor smoothing can be crafted even without poisoning the training data. Although our measure may not be directly exploited as a defense mechanism, it unveils an important phenomenon which may pave the way towards understanding the limitations of current defenses that rely on a smooth decision output for backdoors.
翻译:幕后攻击旨在误导机器学习模型,以便在测试时显示特定触发点时输出攻击者指定的类别。 这些攻击要求中毒培训数据或损害学习算法,例如,通过注射含有触发器的中毒样本,以及理想的等级标签。 尽管关于幕后攻击和防御的研究越来越多,但影响幕后攻击成功的基本因素,以及其对学习算法的影响,目前还不能很好地理解。 在这项工作中,我们的目标是揭示这一问题。特别是,我们揭露后门攻击工作,在触发的样品周围促成一个更顺利的决定功能 -- -- 一种我们称之为“Textit{后门滑动”的现象。我们量化后门平滑,方法是确定一种措施,评估与输入样品的分类器预测相关的不确定性。我们的实验表明,当触发器添加输入样本时,光滑现象会增加,而且更明显地表现为更成功的攻击。然而,我们的实验还表明,在制造后门滑动模式时,即使不使培训数据中毒,也可以创造出更平滑的功能。尽管我们的措施可能不直接利用一个稳定的防御机制来改变目前的决定。