Despite the use of machine learning for many network traffic analysis tasks in security, from application identification to intrusion detection, the aspects of the machine learning pipeline that ultimately determine the performance of the model -- feature selection and representation, model selection, and parameter tuning -- remain manual and painstaking. This paper presents a method to automate many aspects of traffic analysis, making it easier to apply machine learning techniques to a wider variety of traffic analysis tasks. We introduce nPrint, a tool that generates a unified packet representation that is amenable for representation learning and model training. We integrate nPrint with automated machine learning (AutoML), resulting in nPrintML, a public system that largely eliminates feature extraction and model tuning for a wide variety of traffic analysis tasks. We have evaluated nPrintML on eight separate traffic analysis tasks and released nPrint, nPrintML and the corresponding datasets from our evaluation to enable future work to extend these methods.
翻译:尽管从应用识别到入侵检测等许多网络交通分析任务使用了机器学习,但最终决定模型性能的机器学习管道 -- -- 特征选择和代表、模型选择和参数调试 -- -- 的各方面仍然是手工和艰苦的。本文介绍了交通分析许多方面自动化的方法,使机器学习技术更容易应用于更广泛的交通分析任务。我们引入了nPrint,这是一个生成统一包表的工具,便于进行代表性学习和模型培训。我们将nPrint与自动机器学习(AutomML)相结合,从而产生了nPrintML,这是一个公共系统,基本上消除了范围广泛的交通分析任务的特征提取和模型调试。我们评估了NPrintML,8项不同的交通分析任务,并发布了nPrint、nprintML和我们评估中的相应数据集,以便今后的工作能够推广这些方法。