Machine Learning (ML) has become a valuable asset to solve many real-world tasks. For Network Intrusion Detection (NID), however, scientific advances in ML are still seen with skepticism by practitioners. This disconnection is due to the intrinsically limited scope of research papers, many of which primarily aim to demonstrate new methods ``outperforming'' prior work -- oftentimes overlooking the practical implications for deploying the proposed solutions in real systems. Unfortunately, the value of ML for NID depends on a plethora of factors, such as hardware, that are often neglected in scientific literature. This paper aims to reduce the practitioners' skepticism towards ML for NID by "changing" the evaluation methodology adopted in research. After elucidating which "factors" influence the operational deployment of ML in NID, we propose the notion of "pragmatic assessment", which enable practitioners to gauge the real value of ML methods for NID. Then, we show that the state-of-research hardly allows one to estimate the value of ML for NID. As a constructive step forward, we carry out a pragmatic assessment. We re-assess existing ML methods for NID, focusing on the classification of malicious network traffic, and consider: hundreds of configuration settings; diverse adversarial scenarios; and four hardware platforms. Our large and reproducible evaluations enable estimating the quality of ML for NID. We also validate our claims through a user-study with security practitioners.
翻译:机器学习(ML)已成为解决许多实际任务的宝贵资产。然而,对于网络入侵检测(NID),从业人员仍对ML的科学进步怀有怀疑态度。这种脱节是由于研究论文的内在范围受限,其中许多主要旨在展示新方法“优于”之前的工作-往往忽视在实际系统中部署提出的解决方案的实际影响。不幸的是,ML对NID的价值取决于诸多因素,例如硬件,这些因素经常在科学文献中被忽略。本文旨在通过“改变”研究采用的评估方法来减少从业人员对ML用于NID的怀疑。在阐明影响将ML运用于NID的操作部署的“因素”后,我们提出了“实用评估”的概念,使从业人员能够衡量ML方法在NID中的实际价值。然后,我们展示目前的研究水平很难估计ML对NID的价值。作为一个建设性的步骤,我们进行了实用评估。我们重新评估了针对恶意网络流量的分类的现有ML方法,并考虑了数百个配置设置、不同的对抗情景和四个硬件平台。我们大规模且可重复的评估能够估计ML在NID中的质量。我们还通过与安全从业人员的用户研究验证了我们的声明。