Industrial Internet of Things (IIoT) networks have become an increasingly attractive target of cyberattacks. Powerful Machine Learning (ML) models have recently been adopted to implement Network Intrusion Detection Systems (NIDSs), which can protect IIoT networks. For the successful training of such ML models, it is important to select the right set of data features, which maximise the detection accuracy as well as computational efficiency. This paper provides an extensive analysis of the optimal feature sets in terms of the importance and predictive power of network attacks. Three feature selection algorithms; chi-square, information gain and correlation have been utilised to identify and rank data features. The features are fed into two ML classifiers; deep feed-forward and random forest, to measure their attack detection accuracy. The experimental evaluation considered three NIDS datasets: UNSW-NB15, CSE-CIC-IDS2018, and ToN-IoT in their proprietary flow format. In addition, the respective variants in NetFlow format were also considered, i.e., NF-UNSW-NB15, NF-CSE-CIC-IDS2018, and NF-ToN-IoT. The experimental evaluation explored the marginal benefit of adding features one-by-one. Our results show that the accuracy initially increases rapidly with the addition of features, but converges quickly to the maximum achievable detection accuracy. Our results demonstrate a significant potential of reducing the computational and storage cost of NIDS while maintaining near-optimal detection accuracy. This has particular relevance in IIoT systems, with typically limited computational and storage resource.
翻译:为了成功培训这些ML模型,必须选择正确的数据集特征,以最大限度地提高检测准确性和计算效率。本文从网络袭击的重要性和预测力的角度对最佳功能集进行了广泛分析。三种特征选择算法;奇异方位、信息增益和相关性被用于识别和排序数据特征。这些特征被输入两个ML分类器;深向前和随机森林,以测量其袭击检测的准确性。实验评估考虑了三个NIDS数据集:UNS-NB15、CSE-CIC-IDS-2018和TON-IOIO的自有流量格式。此外,还考虑了NetFlow格式中各自的变量,即NF-UNS-NB-15、信息增益和相关变量,以查明和排序数据特征。这些功能被输入到两个MLSG-NF-C-SEIC的分类中; 深度向前进和随机森林,以衡量其袭击检测的准确性。实验性评估考虑了三个国家数据库数据集:UNS-NF-C-C-ID的精确性、IF-IL-I-ILS的快速评估结果,然后以我们的可实现。