Network traffic classification, a task to classify network traffic and identify its type, is the most fundamental step to improve network services and manage modern networks. Classical machine learning and deep learning method have developed well in the field of network traffic classification. However, there are still two major challenges. One is how to protect the privacy of users' traffic data, and the other is that it is difficult to obtain labeled data in reality. In this paper, we propose a novel approach using federated semi-supervised learning for network traffic classification. In our approach, the federated servers and several clients work together to train a global classification model. Among them, unlabeled data is used on the client, and labeled data is used on the server. Moreover, we use two traffic subflow sampling methods: simple sampling and incremental sampling for data preprocessing. The experimental results in the QUIC dataset show that the accuracy of our federated semi-supervised approach can reach 91.08% and 97.81% when using the simple sampling method and incremental sampling method respectively. The experimental results also show that the accuracy gap between our method and the centralized training method is minimal, and it can effectively protect users' privacy and does not require a large amount of labeled data.
翻译:网络交通分类是改进网络服务和管理现代网络的最根本步骤,是将网络交通分类和确定其类型的一项任务,是改进网络交通分类的最根本步骤。古代机器学习和深层次学习方法在网络交通分类领域发展良好。然而,仍然存在两大挑战。一个是如何保护用户交通数据的隐私。一个是如何保护用户交通数据的隐私,另一个是如何在现实中难以获得贴标签的数据。在本文中,我们建议采用一种新颖的方法,使用联合半监督的半监督学习方法来进行网络交通分类。在我们的方法中,联合服务器和若干客户共同努力,培训全球分类模式。其中,用户使用未贴标签的数据,服务器上贴标签的数据。此外,我们使用两种交通分流抽样方法:简单的取样和数据预处理的递增抽样。QuIAC数据集的实验结果表明,在分别使用简单抽样方法和递增抽样方法时,我们采用的半监督方法的准确性可以达到91.08%和97.81%。实验结果还表明,我们的方法与集中培训方法之间的准确性差距并不小,它有效地保护用户的隐私。