关于基于集合体的防止数据中毒机器学习的有力性 (On the Robustness of Ensemble-Based Machine Learning Against Data Poisoning)

Machine learning is becoming ubiquitous. From financial to medicine, machine learning models are boosting decision-making processes and even outperforming humans in some tasks. This huge progress in terms of prediction quality does not however find a counterpart in the security of such models and corresponding predictions, where perturbations of fractions of the training set (poisoning) can seriously undermine the model accuracy. Research on poisoning attacks and defenses even predates the introduction of deep neural networks, leading to several promising solutions. Among them, ensemble-based defenses, where different models are trained on portions of the training set and their predictions are then aggregated, are getting significant attention, due to their relative simplicity and theoretical and practical guarantees. The work in this paper designs and implements a hash-based ensemble approach for ML robustness and evaluates its applicability and performance on random forests, a machine learning model proved to be more resistant to poisoning attempts on tabular datasets. An extensive experimental evaluation is carried out to evaluate the robustness of our approach against a variety of attacks, and compare it with a traditional monolithic model based on random forests.

翻译：机器学习正在变得无处不在。从财政到医学,机器学习模型正在推动决策进程,甚至在某些任务中表现优异。在预测质量方面的巨大进步在这种模型和相应预测的安全性方面找不到对应的对应方,因为这种模型和相应的预测会使训练组的碎片(渗透)的扰动严重破坏模型的准确性。关于中毒袭击和防御的研究甚至早于引入深神经网络,从而导致若干有希望的解决办法。其中,基于共同防御,即对培训组部分进行培训,然后将其预测汇总,正在引起人们的极大关注。本文中的工作为ML稳健性设计并采用基于散列的共鸣方法,并评估其在随机森林中的适用性和性。一个机器学习模型证明更能抵制在表格数据集中进行中毒的尝试。进行了广泛的实验性评估,以评价我们应对各种攻击的方法的稳健性,并将它与基于随机森林的传统单一模型进行比较。

相关内容

Machine Learning

关注 2240

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

专知会员服务

39+阅读 · 2020年11月3日

因果图，Causal Graphs，52页ppt

专知会员服务

250+阅读 · 2020年4月19日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集