Commoditization and broad adoption of machine learning (ML) technologies expose users of these technologies to new security risks. Many models today are based on neural networks. Training and deploying these models for real-world applications involves complex hardware and software pipelines applied to training data from many sources. Models trained on untrusted data are vulnerable to poisoning attacks that introduce "backdoor" functionality. Compromising a fraction of the training data requires few resources from the attacker, but defending against these attacks is a challenge. Although there have been dozens of defenses proposed in the research literature, most of them are expensive to integrate or incompatible with the existing training pipelines. In this paper, we take a pragmatic, developer-centric view and show how practitioners can answer two actionable questions: (1) how robust is my model to backdoor poisoning attacks?, and (2) how can I make it more robust without changing the training pipeline? We focus on the size of the compromised subset of the training data as a universal metric. We propose an easy-to-learn primitive sub-task to estimate this metric, thus providing a baseline on backdoor poisoning. Next, we show how to leverage hyperparameter search - a tool that ML developers already extensively use - to balance the model's accuracy and robustness to poisoning, without changes to the training pipeline. We demonstrate how to use our metric to estimate the robustness of models to backdoor attacks. We then design, implement, and evaluate a multi-stage hyperparameter search method we call Mithridates that strengthens robustness by 3-5x with only a slight impact on the model's accuracy. We show that the hyperparameters found by our method increase robustness against multiple types of backdoor attacks and extend our method to AutoML and federated learning.
翻译:计算机学习(ML)技术的交流和广泛采用使这些技术的用户面临新的安全风险。今天,许多模型都以神经网络为基础。为现实世界应用而培训和部署这些模型需要使用复杂的硬件和软件管道,用于培训来自许多来源的数据。以不可信数据培训的模型很容易中毒攻击,从而引入“后门”功能。将培训数据的一部分部分纳入培训数据需要攻击者提供很少的资源,但防御这些攻击是一项挑战。虽然研究文献中提出了数十种防御,但其中多数用于整合现有培训管道或与现有培训管道不兼容。在本文件中,我们采用务实的、以开发者为中心的视角,并展示从业人员如何回答两个可操作的问题:(1) 我对后门中毒袭击的模型有多强力?以及(2) 不改变培训管道,我如何使它更强力?我们专注于作为通用测量标准的培训数据中受损的子集体大小。我们建议用一个简单直线的原始子任务来估算这个计量,从而提供后门模型的基线。接下来,我们用一个实用的模型来显示我们如何使用超强力的模型来强化设计设计模型,我们是如何用超力的模型来显示超力的模型来显示我们。我们是如何使用超度的模型来显示超力的精确的精确的模型。