Machine learning is a field of artificial intelligence (AI) that is becoming essential for several critical systems, making it a good target for threat actors. Threat actors exploit different Tactics, Techniques, and Procedures (TTPs) against the confidentiality, integrity, and availability of Machine Learning (ML) systems. During the ML cycle, they exploit adversarial TTPs to poison data and fool ML-based systems. In recent years, multiple security practices have been proposed for traditional systems but they are not enough to cope with the nature of ML-based systems. In this paper, we conduct an empirical study of threats reported against ML-based systems with the aim to understand and characterize the nature of ML threats and identify common mitigation strategies. The study is based on 89 real-world ML attack scenarios from the MITRE's ATLAS database, the AI Incident Database, and the literature; 854 ML repositories from the GitHub search and the Python Packaging Advisory database, selected based on their reputation. Attacks from the AI Incident Database and the literature are used to identify vulnerabilities and new types of threats that were not documented in ATLAS. Results show that convolutional neural networks were one of the most targeted models among the attack scenarios. ML repositories with the largest vulnerability prominence include TensorFlow, OpenCV, and Notebook. In this paper, we also report the most frequent vulnerabilities in the studied ML repositories, the most targeted ML phases and models, the most used TTPs in ML phases and attack scenarios. This information is particularly important for red/blue teams to better conduct attacks/defenses, for practitioners to prevent threats during ML development, and for researchers to develop efficient defense mechanisms.
翻译:在ML周期期间,他们利用对抗性TTP系统毒害数据,愚弄以ML为基础的系统。近年来,传统系统提出了多种安全做法,但不足以应对以ML为基础的系统的性质。在本文中,我们对以ML为对象的系统所报告的威胁进行了实证研究,目的是了解和定性ML威胁的性质,并确定共同的缓解战略。在ML周期期间,他们利用对抗性TTP系统毒害数据,愚弄以ML为基础的系统。近年来,他们为传统系统提出了多种安全做法,但不足以应对以ML为基础的系统的性质。在本文中,我们进行了一项实验性研究,目的是了解ML威胁的性质,确定ML威胁的性质,并确定共同的缓解战略。在ML数据库的89个真实世界ML攻击阶段,利用AI事件数据库和文献;在GitHub搜索和Python包装咨询数据库中,854 ML储存的ML储存库,根据他们的声誉选择。从AI事件数据库和文献中发现以ML为对象的弱点和新类型的攻击威胁。在ATL数据库中,在最经常使用的最有目标的ML的模型的模型和ML数据库中,结果显示,ML数据库和最经常的ML的研究人员的行为包括了这一数据库。