With massive data being generated daily and the ever-increasing interconnectivity of the world's Internet infrastructures, a machine learning based intrusion detection system (IDS) has become a vital component to protect our economic and national security. In this paper, we perform a comprehensive study on NSL-KDD, a network traffic dataset, by visualizing patterns and employing different learning-based models to detect cyber attacks. Unlike previous shallow learning and deep learning models that use the single learning model approach for intrusion detection, we adopt a hierarchy strategy, in which the intrusion and normal behavior are classified firstly, and then the specific types of attacks are classified. We demonstrate the advantage of the unsupervised representation learning model in binary intrusion detection tasks. Besides, we alleviate the data imbalance problem with SVM-SMOTE oversampling technique in 4-class classification and further demonstrate the effectiveness and the drawback of the oversampling mechanism with a deep neural network as a base model.
翻译:随着每天生成大量数据,世界互联网基础设施的互联性日益增强,机器学习入侵探测系统(IDS)已成为保护我们经济和国家安全的一个关键组成部分。在本文中,我们通过直观模式和采用不同的学习模型来探测网络袭击,对网络流量数据集NSL-KDD进行了全面研究。与以往使用单一学习模型探测入侵的浅学习和深层学习模型不同,我们采取了等级战略,首先对入侵和正常行为进行分类,然后对特定类型的袭击进行分类。我们展示了未受监督的代表学习模型在二进制入侵探测任务中的优势。此外,我们还在四级分类中用SVM-SMOTE过度抽样技术缓解了数据不平衡问题,并进一步展示了以深层神经网络为基本模型的过度抽样机制的有效性和缺陷。