Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
翻译:在培训机器学习模型时,数据蒸馏已经成为一种提高数据效率的突出技术。它把大型数据集的知识包封入一个较小的合成数据集。在这个较小的蒸馏数据集上受过训练的模型可以达到与原始培训数据集所训练的模型的可比性能。然而,现有的数据元蒸馏技术主要是为了在资源使用效率和模型效用之间实现最佳的权衡。它们产生的安全风险还没有被探讨。这项研究对在图像域中通过数据循环蒸馏模型所精炼的数据所训练的模型进行了第一次后门攻击。具体地说,我们在蒸馏过程中而不是在模型培训阶段(所有以前的攻击都在那里进行)将触发合成数据。我们提出了两种类型的后门攻击,即NAVIVATACK和DOORPING。 NAVATACK只是为初始蒸馏阶段的原始数据添加了触发因素,而DORPING 在整个蒸馏程序中反复更新了触发因素。我们可以对多个数据集、架构蒸馏模型蒸馏模型的模型进行广泛的评价,而不是在模型培训阶段,所有攻击过程中进行合成的合成数据触发机制。我们从实际SRSARSAL 评估了某种性攻击,我们的行为表现了某种性评估。