This work presents Reliable-NIDS (R-NIDS), a novel methodology for Machine Learning (ML) based Network Intrusion Detection Systems (NIDSs) that allows ML models to work on integrated datasets, empowering the learning process with diverse information from different datasets. Therefore, R-NIDS targets the design of more robust models, that generalize better than traditional approaches. We also propose a new dataset, called UNK21. It is built from three of the most well-known network datasets (UGR'16, USNW-NB15 and NLS-KDD), each one gathered from its own network environment, with different features and classes, by using a data aggregation approach present in R-NIDS. Following R-NIDS, in this work we propose to build two well-known ML models (a linear and a non-linear one) based on the information of three of the most common datasets in the literature for NIDS evaluation, those integrated in UNK21. The results that the proposed methodology offers show how these two ML models trained as a NIDS solution could benefit from this approach, being able to generalize better when training on the newly proposed UNK21 dataset. Furthermore, these results are carefully analyzed with statistical tools that provide high confidence on our conclusions.
翻译:这项工作提出了可靠NIDS(R-NIDS),这是基于机器学习(ML)的网络入侵探测系统(NIDS)的新颖方法,它使ML模型能够使用来自不同数据集的多种信息进行综合数据集的工作,赋予学习过程以不同数据集的多种信息,因此,R-NIDS的目标是设计更可靠的模型,这种模型比传统方法更为广泛。我们还提出了一个新的数据集,称为UNK21。它来自三个最著名的网络数据集(UGR'16、USNW-NB15和NLS-KDD),每个数据集都是从自己的网络环境中收集的,具有不同特点和类别,采用R-NIDS中存在的数据汇总方法。在此工作中,我们提议根据三个最著名的ML模型(线性和非线性模型)的信息,建立一个称为UNK21的新的数据集。 提议的方法显示,作为NIDS解决办法培训的这两个ML模型如何从这一方法中受益,并且具有不同的特点和类别。在R-NIDSDS中,在R-NIDS之后,我们提议采用的数据汇总方法后,我们建议建立两个模型,能够更全面地分析我们关于这些统计数据的工具。