Network Intrusion and Detection Systems (NIDS) are essential for malicious traffic and cyberattack detection in modern networks. Artificial intelligence-based NIDS are powerful tools that can learn complex data correlations for accurate attack prediction. Graph Neural Networks (GNNs) provide an opportunity to analyze network topology along with flow features which makes them particularly suitable for NIDS applications. However, successful application of such tool requires large amounts of carefully collected and labeled data for training and testing. In this paper we inspect different versions of ToN-IoT dataset and point out inconsistencies in some versions. We filter the full version of ToN-IoT and present a new version labeled ToN-IoT-R. To ensure generalization we propose a new standardized and compact set of flow features which are derived solely from NetFlowv5-compatible data. We separate numeric data and flags into different categories and propose a new dataset-agnostic normalization approach for numeric features. This allows us to preserve meaning of flow flags and we propose to conduct targeted analysis based on, for instance, network protocols. For flow classification we use E-GraphSage algorithm with modified node initialization technique that allows us to add node degree to node features. We achieve high classification accuracy on ToN-IoT-R and compare it with previously published results for ToN-IoT, NF-ToN-IoT, and NF-ToN-IoT-v2. We highlight the importance of careful data collection and labeling and appropriate data preprocessing choice and conclude that the proposed set of features is more applicable for real NIDS due to being less demanding to traffic monitoring equipment while preserving high flow classification accuracy.
翻译:网络入侵和探测系统(NIDS)对于现代网络的恶意交通和网络攻击探测至关重要。人工智能NIDS是强大的工具,可以学习精确攻击预测所需的复杂数据相关性。图表神经网络(GNNS)提供了一个机会,可以分析网络地形和流动特征,这些特征特别适合NIDS应用。然而,成功应用这种工具需要大量精心收集和贴标签的数据,用于培训和测试。在这份文件中,我们检查TON-IoT数据集的不同版本,指出某些版本的不一致之处。我们过滤了TON-IoT的完整版本,并展示了标记为TNN-Iot-RoT的新版本。为了确保普遍化,我们提出了一套新的标准化和紧凑的流程特征,这些特征完全来自NetFlowv5的兼容性数据。我们将数字数据和标志分解为不同类别,并提议一个新的数据集-NATICS-nortical 格式的标准化,我们用网络协议进行精确化分析,我们用网络协议进行更精确的系统化分析,我们用前的系统进行更精确的分类,我们用前的系统进行更精确的分类,我们用前的系统进行更精确的分类,我们用前的系统进行更精确的分类,我们用前的系统进行更不高的分类,我们用前的分类来进行更精确的分类,我们用前的分类来进行更精确的分类,我们用前的编码的编码的分类,不进行更精确的编码的分类,我们用到更低的编码的编码的分类,我们用前的编码的编码来进行更低的分类,我们用前的编码的编码来进行更低的编码的编码的编码的编码来进行更低的编码的编码的编码的编码的编码的编码的编码的编码的编码,用来用来去去的编码的编码的编码,用来去的编码的编码,用来去的编码,用来去的编码。