基准基准基准基准基准基准 -- -- 合成NIDS数据集分析 (Benchmarking the Benchmark -- Analysis of Synthetic NIDS Datasets)

Network Intrusion Detection Systems (NIDSs) are an increasingly important tool for the prevention and mitigation of cyber attacks. A number of labelled synthetic datasets generated have been generated and made publicly available by researchers, and they have become the benchmarks via which new ML-based NIDS classifiers are being evaluated. Recently published results show excellent classification performance with these datasets, increasingly approaching 100 percent performance across key evaluation metrics such as accuracy, F1 score, etc. Unfortunately, we have not yet seen these excellent academic research results translated into practical NIDS systems with such near-perfect performance. This motivated our research presented in this paper, where we analyse the statistical properties of the benign traffic in three of the more recent and relevant NIDS datasets, (CIC, UNSW, ...). As a comparison, we consider two datasets obtained from real-world production networks, one from a university network and one from a medium size Internet Service Provider (ISP). Our results show that the two real-world datasets are quite similar among themselves in regards to most of the considered statistical features. Equally, the three synthetic datasets are also relatively similar within their group. However, and most importantly, our results show a distinct difference of most of the considered statistical features between the three synthetic datasets and the two real-world datasets. Since ML relies on the basic assumption of training and test datasets being sampled from the same distribution, this raises the question of how well the performance results of ML-classifiers trained on the considered synthetic datasets can translate and generalise to real-world networks. We believe this is an interesting and relevant question which provides motivation for further research in this space.

翻译：网络入侵探测系统(NIDS)是预防和减缓网络攻击的一个日益重要的工具。一些研究人员已经制作并公布一些贴上标签的合成数据集,这些数据集已成为评估以ML为基础的新的NIDS分类器的基准。最近公布的结果显示,这些数据集的分类性能优异,在精度、F1分等关键评价指标中,业绩日益接近100%。不幸的是,我们尚未看到这些出色的学术研究成果转化为实用的NIDS系统,其性能接近完美。这促使我们在本文件中进行的研究,我们分析了三个最新和相关的NIDS数据集(CIC、UNSW.......)的无害流量的统计性质。作为比较,我们认为,从真实世界生产网络获得的两套数据集,一个大学网络,一个中等规模的互联网服务供应商(ISP)等。我们的研究结果显示,两个真实世界数据集本身在大多数考虑的统计特征方面非常相似。同样,我们所考虑的3个合成数据集也是在最新和最接近的世界抽样数据库中分析的结果。一个不同的统计模型显示,这一类中,这一类中真实数据是真实的模型中的数据的精确性。