深学习和交通分类:从具有数百个加密和零日应用程序的商业级数据集中吸取的经验教训 (Deep Learning and Traffic Classification: Lessons learned from a commercial-grade dataset with hundreds of encrypted and zero-day applications)

The increasing success of Machine Learning (ML) and Deep Learning (DL) has recently re-sparked interest towards traffic classification. While classification of known traffic is a well investigated subject with supervised classification tools (such as ML and DL models) are known to provide satisfactory performance, detection of unknown (or zero-day) traffic is more challenging and typically handled by unsupervised techniques (such as clustering algorithms). In this paper, we share our experience on a commercial-grade DL traffic classification engine that is able to (i) identify known applications from encrypted traffic, as well as (ii) handle unknown zero-day applications. In particular, our contribution for (i) is to perform a thorough assessment of state of the art traffic classifiers in commercial-grade settings comprising few thousands of very fine grained application labels, as opposite to the few tens of classes generally targeted in academic evaluations. Additionally, we contribute to the problem of (ii) detection of zero-day applications by proposing a novel technique, tailored for DL models, that is significantly more accurate and light-weight than the state of the art. Summarizing our main findings, we gather that (i) while ML and DL models are both equally able to provide satisfactory solution for classification of known traffic, however (ii) the non-linear feature extraction process of the DL backbone provides sizeable advantages for the detection of unknown classes.

翻译：机器学习(ML)和深层学习(DL)日益成功最近重新激发了对交通分类的兴趣。已知交通的分类是一个调查周密的课题,有监督的分类工具(如ML和DL模式),已知的分类工具(如ML和DL模式)能够提供令人满意的性能,但发现未知(或零日)交通则更具挑战性,通常由不受监督的技术(如集群算法)处理。在本文件中,我们分享了商业级DL交通分类引擎的经验,该引擎能够:(一) 识别加密交通的已知应用程序,以及(二) 处理未知零日应用程序。特别是,我们对(一) 对商业级的艺术交通分类工具(如ML和DL模型)进行彻底评估,其中包括几千个非常精细的食品应用标签,而学术评估通常针对的几门类技术(如集群算算算算算法)。此外,我们协助解决以下问题:(二) 检测零日应用的新技术,针对DL模式,该技术的精确性和轻度大大高于艺术状态。我们的主要调查结果解算出D级的主要结论,我们收集了为ML级的准确的底座模型,但是,我们同样为ML级的测底座的模型提供了不为不为难标的底座的底座的底座的模型。