随大分散而来大分散更大的复原力: 高效毒害攻击和防御线性后退模型 (With Great Dispersion Comes Greater Resilience: Efficient Poisoning Attacks and Defenses for Linear Regression Models)

With the rise of third parties in the machine learning pipeline, the service provider in "Machine Learning as a Service" (MLaaS), or external data contributors in online learning, or the retraining of existing models, the need to ensure the security of the resulting machine learning models has become an increasingly important topic. The security community has demonstrated that without transparency of the data and the resulting model, there exist many potential security risks, with new risks constantly being discovered. In this paper, we focus on one of these security risks -- poisoning attacks. Specifically, we analyze how attackers may interfere with the results of regression learning by poisoning the training datasets. To this end, we analyze and develop a new poisoning attack algorithm. Our attack, termed Nopt, in contrast with previous poisoning attack algorithms, can produce larger errors with the same proportion of poisoning data-points. Furthermore, we also significantly improve the state-of-the-art defense algorithm, termed TRIM, proposed by Jagielsk et al. (IEEE S&P 2018), by incorporating the concept of probability estimation of clean data-points into the algorithm. Our new defense algorithm, termed Proda, demonstrates an increased effectiveness in reducing errors arising from the poisoning dataset through optimizing ensemble models. We highlight that the time complexity of TRIM had not been estimated; however, we deduce from their work that TRIM can take exponential time complexity in the worst-case scenario, in excess of Proda's logarithmic time. The performance of both our proposed attack and defense algorithms is extensively evaluated on four real-world datasets of housing prices, loans, health care, and bike sharing services. We hope that our work will inspire future research to develop more robust learning algorithms immune to poisoning attacks.

翻译：随着机器学习管道中第三方的上升,在“机器学习机床”中,“超额电路学习作为服务”(MLaaS)中的服务供应商,或在线学习中的外部数据提供者,或现有模型的再培训,确保由此产生的机器学习模型安全的必要性已成为越来越重要的话题。安全界已经表明,如果数据及其产生的模型缺乏透明度,存在许多潜在的安全风险,并不断发现新的风险。在本文件中,我们侧重于这些安全风险之一 -- -- 中毒袭击。具体地说,我们分析攻击者如何通过毒化培训数据集来干扰回归学习的结果。为此,我们分析并开发一个新的中毒攻击算法。我们称为诺普特的袭击(Nopt),与以往的中毒攻击算法相比,可以产生更大的错误。此外,我们还大幅改进了最新的国防算法,即Jagielsk 等人(EEEEES&P 2018)提出的称为TRIM,通过将清洁数据点的概率估算概念纳入算法。我们的新国防算法,称为ProdaM,我们从命算算法中显示,我们从概率测算算算算算中,我们从四期测测算中测测算中,我们测测算中测测算中的数据将会降低。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/