以ML为基础的隧道探测和隧道应用分类 (ML-based tunnel detection and tunneled application classification)

Encrypted tunneling protocols are widely used. Beyond business and personal uses, malicious actors also deploy tunneling to hinder the detection of Command and Control and data exfiltration. A common approach to maintain visibility on tunneling is to rely on network traffic metadata and machine learning to analyze tunnel occurrence without actually decrypting data. Existing work that address tunneling protocols however exhibit several weaknesses: their goal is to detect application inside tunnels and not tunnel identification, they exhibit limited protocol coverage (e.g. OpenVPN and Wireguard are not addressed), and both inconsistent features and diverse machine learning techniques which makes performance comparison difficult. Our work makes four contributions that address these limitations and provide further analysis. First, we address OpenVPN and Wireguard. Second, we propose a complete pipeline to detect and classify tunneling protocols and tunneled applications. Third, we present a thorough analysis of the performance of both network traffic metadata features and machine learning techniques. Fourth, we provide a novel analysis of domain generalization regarding background untunneled traffic, and, both domain generalization and adversarial learning regarding Maximum Transmission Unit (MTU).

翻译：除商业和个人用途外,恶意行为者还利用隧道阻碍发现指挥与控制和数据泄漏; 保持隧道可见度的一个共同办法是依靠网络交通元数据和机器学习来分析隧道发生情况而不实际解密数据; 现有的解决隧道协议的工作存在若干弱点:它们的目标是探测隧道内的应用情况,而不是查明隧道身份,它们表现出有限的协议覆盖范围(例如,OpenVPN和Wireguard没有被处理),以及不一致的特征和使业绩比较难以进行的各种机器学习技术。我们的工作作出了四项贡献,解决这些限制并提供了进一步的分析。首先,我们处理OpenVPN和Wereguard。第二,我们提议建立一个完整的管道,以探测和分类隧道协议和隧道应用程序。第三,我们对网络交通元特征和机器学习技术的性能进行透彻分析。第四,我们对背景未断层交通的广域域进行了新分析,并对关于最大传输装置的域域通用和对抗性学习进行了新分析。