Internet traffic classification is widely used to facilitate network management. It plays a crucial role in Quality of Services (QoS), Quality of Experience (QoE), network visibility, intrusion detection, and traffic trend analyses. While there is no theoretical guarantee that deep learning (DL)-based solutions perform better than classic machine learning (ML)-based ones, DL-based models have become the common default. This paper compares well-known DL-based and ML-based models and shows that in the case of malicious traffic classification, state-of-the-art DL-based solutions do not necessarily outperform the classical ML-based ones. We exemplify this finding using two well-known datasets for a varied set of tasks, such as: malware detection, malware family classification, detection of zero-day attacks, and classification of an iteratively growing dataset. Note that, it is not feasible to evaluate all possible models to make a concrete statement, thus, the above finding is not a recommendation to avoid DL-based models, but rather empirical proof that in some cases, there are more simplistic solutions, that may perform even better.
翻译:互联网交通分类被广泛用于促进网络管理,在服务质量(Qos)、经验质量(QoE)、网络可见度、入侵探测和交通趋势分析方面发挥着关键作用。虽然在理论上无法保证深层次学习(DL)解决方案比经典机器学习(ML)的解决方案效果更好,但基于DL的模型已成为常见的缺省。本文比较了众所周知的基于DL的模型和基于ML的模型,并表明在恶意交通分类方面,基于DL的最新解决方案并不一定超越基于ML的典型解决方案。我们用两种众所周知的数据集来示范这一发现,用于一系列不同的任务,例如:恶意软件检测、恶意家庭分类、零日袭击检测和迭代增长数据集分类。请注意,不可能对所有可能的模型都作出具体陈述,因此,上述发现并非建议避免基于DL的模型,而是经验证明,在某些情况下,有更为简单化的解决方案可能更好执行。