Classifying network traffic is the basis for important network applications. Prior research in this area has faced challenges on the availability of representative datasets, and many of the results cannot be readily reproduced. Such a problem is exacerbated by emerging data-driven machine learning based approaches. To address this issue, we present(N et)2databasewith three open datasets containing nearly 1.3M labeled flows in total, with a comprehensive list of flow features, for there search community1. We focus on broad aspects in network traffic analysis, including both malware detection and application classification. As we continue to grow them, we expect the datasets to serve as a common ground for AI driven, reproducible research on network flow analytics. We release the datasets publicly and also introduce a Multi-Task Hierarchical Learning (MTHL)model to perform all tasks in a single model. Our results show that MTHL is capable of accurately performing multiple tasks with hierarchical labeling with a dramatic reduction in training time.
翻译:网络流量分类是重要网络应用程序的基础。 先前在这一领域的研究在提供代表性数据集方面面临挑战, 且许多结果无法轻易复制。 这个问题因新出现的数据驱动机学习方法而加剧。 为了解决这一问题, 我们为搜索界提出( et) 2 数据库, 有三个开放数据集, 总共包含近1.3M 标记的流量, 并有一个完整的流量特征清单 。 我们注重网络流量分析的广泛方面, 包括恶意软件的检测和应用分类。 当我们继续发展这些数据集时, 我们期望这些数据集成为AI驱动的、 可复制的网络流程分析研究的共同点。 我们公开发布数据集, 并引入多塔斯克高等级学习模型, 在一个模型中执行所有任务 。 我们的结果表明, MTHL 能够准确完成多个等级标签任务, 培训时间急剧缩短 。