Internet of Things (IoT) networks have become an increasingly attractive target of cyberattacks. Powerful Machine Learning (ML) models have recently been adopted to implement network intrusion detection systems to protect IoT networks. For the successful training of such ML models, selecting the right data features is crucial, maximising the detection accuracy and computational efficiency. This paper comprehensively analyses feature sets' importance and predictive power for detecting network attacks. Three feature selection algorithms: chi-square, information gain and correlation, have been utilised to identify and rank data features. The attributes are fed into two ML classifiers: deep feed-forward and random forest, to measure their attack detection performance. The experimental evaluation considered three datasets: UNSW-NB15, CSE-CIC-IDS2018, and ToN-IoT in their proprietary flow format. In addition, the respective variants in NetFlow format were also considered, i.e., NF-UNSW-NB15, NF-CSE-CIC-IDS2018, and NF-ToN-IoT. The experimental evaluation explored the marginal benefit of adding individual features. Our results show that the accuracy initially increases rapidly with adding features but converges quickly to the maximum. This demonstrates a significant potential to reduce the computational and storage cost of intrusion detection systems while maintaining near-optimal detection accuracy. This has particular relevance in IoT systems, with typically limited computational and storage resources.
翻译:互联网(IoT)网络已成为越来越具有吸引力的网络攻击目标。最近采用了强大的机器学习(ML)模型,以实施网络入侵探测系统来保护IoT网络。为了成功培训这些ML模型,选择正确的数据特征至关重要,使检测准确性和计算效率最大化。本文全面分析探测网络袭击的重要性和预测能力。还考虑了三种特征选择算法:即香味、信息增益和相关性,用于识别和排序数据特征。这些属性被输入两个ML分类器:深向前和随机森林,以测量其攻击探测性性性能。实验性评估考虑了三个数据集:UNSW-NB15、CSE-CIC-IDS2018和T,其选择正确的数据特征是:UNSW-NB15、CSE-CIC-IDS2018,以及T。这个实验性评估考虑了三个数据集:UNFlow格式对检测网络袭击的重要性和预测力的预测力。此外,还考虑了三个特征的变量:即NF-UNSW-UNS-NB15、NF-C-CSE-IC-IC-IC-IDS-2018,以及NF-O-IO-IO-IO-IO。这个实验性分类通常向前和随机测量性地测量性能测量性能的测试性能,以测量性能度测量性能度测量性能度测量性能度测量性能。这个实验性能快速的初始化的精确性能快速探索性特征,并快速探索和精确性能展示性能。这个实验性能和精确性能展示了我们测算,同时快速测量性能性能性能性能。这个实验性能的初始性能性能的初始性能的初始性能。这个实验性能的初始性能的初始性能。这个实验性能的初始性能的初始性能。这个实验性能的初始性能的初始性地展示了这个实验性能。这个实验性能,这个实验性能。这个实验性能的模型在快速测量性能和精确性能的实验性能上展示性能,在快速测量性能和精确性能能和精确性能的实验性能向上展示了我们性能向上向上展示了一种潜在的的精确性能能能能能能的