Due to the size and nature of intrusion detection datasets, intrusion detection systems (IDS) typically take high computational complexity to examine features of data and identify intrusive patterns. Data preprocessing techniques such as feature selection can be used to reduce such complexity by eliminating irrelevant and redundant features in the dataset. The objective of this study is to analyze the efficiency and effectiveness of some feature selection approaches namely, wrapper-based and filter-based modeling approaches. To achieve that, a hybrid of feature selection algorithm in combination with wrapper and filter selection processes is designed. We propose a wrapper-based hybrid intrusion detection modeling with a decision tree algorithm to guide the selection process. Five machine learning algorithms are used on the wrapper and filter-based feature selection methods to build IDS models using the UNSW-NB15 dataset. The three filter-based methods namely, information gain, gain ratio, and relief are used for comparison to determine the efficiency and effectiveness of the proposed approach. Furthermore, a fair comparison with other state-of-the-art intrusion detection approaches is also performed. The experimental results show that our approach is quite effective in comparison to state-of-the-art works, however, it takes high computational time in comparison to the filter-based methods whilst achieves similar results. Our work also revealed unobserved issues about the conformity of the UNSW-NB15 dataset.
翻译:由于入侵探测数据集的规模和性质,入侵探测系统(IDS)通常具有很高的计算复杂性,以检查数据特征和识别侵扰模式。通过消除数据集中的不相干和冗余特征,可以使用诸如特征选择等预处理技术来降低这种复杂性。本研究的目的是分析某些特征选择方法的效率和有效性,即基于包装和过滤的建模方法。为了实现这一点,还设计了与包装和过滤选择程序相结合的特征选择算法组合。我们提出了基于包装的混合入侵探测模型,并配有决定树算法来指导选择过程。在基于包装和过滤的特征选择方法上使用了五种机器学习算法,以利用UNSW-NB15数据集建立信息传输和过滤的特征选择模型。三种基于过滤的方法,即信息收益率、收益率和救济率,用于比较拟议方法的效率和效力。此外,还与其他最先进的入侵探测方法进行了公平的比较。实验结果表明,我们的方法在与基于状态的树型算法和基于过滤器的筛选方法进行比较方面相当有效。但是,在对已公布的联合国已披露的测试结果进行高比。