Recently, advances in deep learning have been observed in various fields, including computer vision, natural language processing, and cybersecurity. Machine learning (ML) has demonstrated its ability as a potential tool for anomaly detection-based intrusion detection systems to build secure computer networks. Increasingly, ML approaches are widely adopted than heuristic approaches for cybersecurity because they learn directly from data. Data is critical for the development of ML systems, and becomes potential targets for attackers. Basically, data poisoning or contamination is one of the most common techniques used to fool ML models through data. This paper evaluates the robustness of six recent deep learning algorithms for intrusion detection on contaminated data. Our experiments suggest that the state-of-the-art algorithms used in this study are sensitive to data contamination and reveal the importance of self-defense against data perturbation when developing novel models, especially for intrusion detection systems.
翻译:最近,在包括计算机视觉、自然语言处理和网络安全在内的各个领域都观察到了深层次学习的进展。机器学习(ML)证明了它作为异常现象探测入侵探测系统的潜在工具以建立安全的计算机网络的能力。ML方法由于直接从数据中学习而日益被广泛采用,而不是网络安全超常方法。数据对于ML系统的发展至关重要,并成为攻击者的潜在目标。数据中毒或污染是利用数据愚弄ML模型的最常用技术之一。本文评估了最近六种深层学习算法对受污染数据入侵探测的稳健性。我们的实验表明,本研究中使用的最新算法对数据污染十分敏感,并揭示了在开发新模型时,特别是入侵探测系统时对数据扰动进行自卫的重要性。