HTTP-based Trojan is extremely threatening, and it is difficult to be effectively detected because of its concealment and confusion. Previous detection methods usually are with poor generalization ability due to outdated datasets and reliance on manual feature extraction, which makes these methods always perform well under their private dataset, but poorly or even fail to work in real network environment. In this paper, we propose an HTTP-based Trojan detection model via the Hierarchical Spatio-Temporal Features of traffics (HSTF-Model) based on the formalized description of traffic spatio-temporal behavior from both packet level and flow level. In this model, we employ Convolutional Neural Network (CNN) to extract spatial information and Long Short-Term Memory (LSTM) to extract temporal information. In addition, we present a dataset consisting of Benign and Trojan HTTP Traffic (BTHT-2018). Experimental results show that our model can guarantee high accuracy (the F1 of 98.62%-99.81% and the FPR of 0.34%-0.02% in BTHT-2018). More importantly, our model has a huge advantage over other related methods in generalization ability. HSTF-Model trained with BTHT-2018 can reach the F1 of 93.51% on the public dataset ISCX-2012, which is 20+% better than the best of related machine learning methods.
翻译:暂无翻译