Most research using machine learning (ML) for network intrusion detection systems (NIDS) uses well-established datasets such as KDD-CUP99, NSL-KDD, UNSW-NB15, and CICIDS-2017. In this context, the possibilities of machine learning techniques are explored, aiming for metrics improvements compared to the published baselines (model-centric approach). However, those datasets present some limitations as aging that make it unfeasible to transpose those ML-based solutions to real-world applications. This paper presents a systematic data-centric approach to address the current limitations of NIDS research, specifically the datasets. This approach generates NIDS datasets composed of the most recent network traffic and attacks, with the labeling process integrated by design.
翻译:在网络入侵探测系统(NIDS)使用机器学习(ML)进行的大多数研究都使用成熟的数据集,如KDD-CUP99、NSL-KDD、UNSW-NB15和CICIDS-2017。在这方面,探索了机器学习技术的可能性,目的是与公布的基线(以模型为中心的方法)相比,改进衡量标准;然而,这些数据集具有一些局限性,因为老化使得无法将这些基于ML的解决方案转换到现实世界应用中。本文提出了一种系统的数据中心方法,以解决NIDS研究,特别是数据集目前存在的局限性。这种方法产生了由最新的网络流量和攻击构成的NIDS数据集,并采用设计一体化的标签进程。