The ability to analyse and differentiate network protocol traffic is crucial for network resource management to provide differentiated services by Telcos. Automated Protocol Analysis (APA) is crucial to significantly improve efficiency and reduce reliance on human experts. There are numerous automated state-of-the-art unsupervised methods for clustering unknown protocols in APA. However, many such methods have not been sufficiently explored using diverse test datasets. Thus failing to demonstrate their robustness to generalise. This study proposed a comprehensive framework to evaluate various combinations of feature extraction and clustering methods in APA. It also proposed a novel approach to automate selection of dataset dependent model parameters for feature extraction, resulting in improved performance. Promising results of a novel field-based tokenisation approach also led to our proposal of a novel automated hybrid approach for feature extraction and clustering of unknown protocols in APA. Our proposed hybrid approach performed the best in 7 out of 9 of the diverse test datasets, thus displaying the robustness to generalise across diverse unknown protocols. It also outperformed the unsupervised clustering technique in state-of-the-art open-source APA tool, NETZOB in all test datasets.
翻译:自动协议分析(APA)对于大幅度提高效率和减少对人类专家的依赖至关重要。在APA中,有许多最先进的、不受监督的自动组合未知协议的方法。然而,许多这类方法尚未使用不同的测试数据集进行充分探讨。因此,未能展示其强性来概括性。这项研究提出了一个综合框架来评价APA中地物提取和集群方法的各种组合。它还提出了一个新颖的方法,用于自动选择地物提取的数据集依赖模型参数参数参数,从而改进性能。基于外地的新型象征性化方法的预期结果也导致我们提出了在APA中对未知协议的特性提取和组合采用新型的自动化混合方法。我们提议的混合方法在9个不同的测试数据集中的7个中表现最佳,从而展示了在各种未知协议中进行综合的强性。它也超越了在所有测试数据集中采用的最新开放源地APA工具(NETZOB)中未超超超的组合技术。