We investigate the detection of botnet command and control (C2) hosts in massive IP traffic using machine learning methods. To this end, we use NetFlow data -- the industry standard for monitoring of IP traffic -- and ML models using two sets of features: conventional NetFlow variables and distributional features based on NetFlow variables. In addition to using static summaries of NetFlow features, we use quantiles of their IP-level distributions as input features in predictive models to predict whether an IP belongs to known botnet families. These models are used to develop intrusion detection systems to predict traffic traces identified with malicious attacks. The results are validated by matching predictions to existing denylists of published malicious IP addresses and deep packet inspection. The usage of our proposed novel distributional features, combined with techniques that enable modelling complex input feature spaces result in highly accurate predictions by our trained models.
翻译:我们使用机器学习方法调查检测大量IP网络指挥和控制(C2)主机。为此,我们使用NetFlow数据 -- -- 监测IP交通的行业标准 -- -- 和ML模型,使用两套特征:传统NetFlow变量和基于NetFlow变量的分布特征。除了使用NetFlow特征静态摘要外,我们还使用其IP级别分布的量化数据作为预测模型的输入特征,以预测一个已知的IP是否属于potnet家族。这些模型用于开发入侵探测系统,以预测恶意袭击所查明的流量。通过将预测与已公布的恶意IP地址的现有拒绝列表和深度包包检查相匹配,结果得到验证。我们拟议的新版分布特征的使用,加上能够模拟复杂输入空间的技术,导致我们受过培训的模型作出高度准确的预测。