A large number of network security breaches in IoT networks have demonstrated the unreliability of current Network Intrusion Detection Systems (NIDSs). Consequently, network interruptions and loss of sensitive data have occurred, which led to an active research area for improving NIDS technologies. In an analysis of related works, it was observed that most researchers aim to obtain better classification results by using a set of untried combinations of Feature Reduction (FR) and Machine Learning (ML) techniques on NIDS datasets. However, these datasets are different in feature sets, attack types, and network design. Therefore, this paper aims to discover whether these techniques can be generalised across various datasets. Six ML models are utilised: a Deep Feed Forward (DFF), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Decision Tree (DT), Logistic Regression (LR), and Naive Bayes (NB). The accuracy of three Feature Extraction (FE) algorithms; Principal Component Analysis (PCA), Auto-encoder (AE), and Linear Discriminant Analysis (LDA), are evaluated using three benchmark datasets: UNSW-NB15, ToN-IoT and CSE-CIC-IDS2018. Although PCA and AE algorithms have been widely used, the determination of their optimal number of extracted dimensions has been overlooked. The results indicate that no clear FE method or ML model can achieve the best scores for all datasets. The optimal number of extracted dimensions has been identified for each dataset, and LDA degrades the performance of the ML models on two datasets. The variance is used to analyse the extracted dimensions of LDA and PCA. Finally, this paper concludes that the choice of datasets significantly alters the performance of the applied techniques. We believe that a universal (benchmark) feature set is needed to facilitate further advancement and progress of research in this field.
翻译:在IOT网络中,大量网络安全漏洞表明当前网络入侵探测系统(NIDS)不可靠,因此,网络中断和丢失敏感数据的情况已经发生,从而导致一个改进NIDS技术的积极研究领域。在对相关工作的分析中,发现大多数研究人员的目标是通过在NIDS数据集中使用一套未经尝试的特性减少(FR)和机器学习(ML)技术组合来获得更好的分类结果。然而,这些数据集在特性集、攻击类型和网络设计方面是不同的。因此,本文件旨在查明这些技术是否可以在各种数据集中被概括化。6个MLL模型模型模型模型模型的中断和丢失,从而导致改进NIDS技术。