Recent network traffic classification methods benefitfrom machine learning (ML) technology. However, there aremany challenges due to use of ML, such as: lack of high-qualityannotated datasets, data-drifts and other effects causing aging ofdatasets and ML models, high volumes of network traffic etc. Thispaper argues that it is necessary to augment traditional workflowsof ML training&deployment and adapt Active Learning concepton network traffic analysis. The paper presents a novel ActiveLearning Framework (ALF) to address this topic. ALF providesprepared software components that can be used to deploy an activelearning loop and maintain an ALF instance that continuouslyevolves a dataset and ML model automatically. The resultingsolution is deployable for IP flow-based analysis of high-speed(100 Gb/s) networks, and also supports research experiments ondifferent strategies and methods for annotation, evaluation, datasetoptimization, etc. Finally, the paper lists some research challengesthat emerge from the first experiments with ALF in practice.
翻译:最近的网络交通分类方法得益于机器学习技术。然而,由于使用ML,存在着许多挑战,例如:缺乏高质量的附加说明的数据集、数据驱动和其他影响导致数据集和ML模型老化、网络流量大等。 本文认为,有必要加强ML培训和部署的传统工作流程,并修改网络交通分析的积极学习概念分析。本文件介绍了用于解决这一问题的新颖的主动学习框架(ALF)。ALF提供了准备好的软件组件,可用于部署积极的学习环路,并维持一个自动演变数据集和ML模型的ALF实例。由此产生的解决方案可用于对高速(100Gb/s)网络进行IP流基分析,还支持关于不同战略的研究实验和说明、评价、数据采石化等方法。最后,本文列举了在实践中与ALF进行的首次实验所产生的一些研究挑战。