The cyberthreat landscape is continuously evolving. Hence, continuous monitoring and sharing of threat intelligence have become a priority for organizations. Threat reports, published by cybersecurity vendors, contain detailed descriptions of attack Tactics, Techniques, and Procedures (TTP) written in an unstructured text format. Extracting TTP from these reports aids cybersecurity practitioners and researchers learn and adapt to evolving attacks and in planning threat mitigation. Researchers have proposed TTP extraction methods in the literature, however, not all of these proposed methods are compared to one another or to a baseline. \textit{The goal of this study is to aid cybersecurity researchers and practitioners choose attack technique extraction methods for monitoring and sharing threat intelligence by comparing the underlying methods from the TTP extraction studies in the literature.} In this work, we identify ten existing TTP extraction studies from the literature and implement five methods from the ten studies. We find two methods, based on Term Frequency-Inverse Document Frequency(TFIDF) and Latent Semantic Indexing (LSI), outperform the other three methods with a F1 score of 84\% and 83\%, respectively. We observe the performance of all methods in F1 score drops in the case of increasing the class labels exponentially. We also implement and evaluate an oversampling strategy to mitigate class imbalance issues. Furthermore, oversampling improves the classification performance of TTP extraction. We provide recommendations from our findings for future cybersecurity researchers, such as the construction of a benchmark dataset from a large corpus; and the selection of textual features of TTP. Our work, along with the dataset and implementation source code, can work as a baseline for cybersecurity researchers to test and compare the performance of future TTP extraction methods.
翻译:因此,持续监测和分享威胁情报已成为各组织的一个优先事项。网络安全供应商出版的威胁报告以非结构化文本格式对攻击战术、技术和程序(TTP)进行详细描述。从这些报告中提取TTP有助于网络安全从业人员和研究人员学习和适应不断变化的攻击和规划减轻威胁。研究人员在文献中提出了TTP提取方法,但并不是所有这些拟议方法都相互比较或与基线比较。\textit{本研究的目的是帮助网络安全研究人员和从业人员选择攻击性技术提取方法,以监测和分享威胁情报,办法是比较TTP提取研究文献中的基本方法。}在这项工作中,我们从文献中找出10项现有的TTP提取研究,并采用10项研究中的5种方法。我们根据TTIDF频率和Lent Semantict 索引(LSI)提出了两种方法,这些方法比其他3种方法分别比F1分和83分。我们观察了F1基准的代码,将所有方法的基线提取方法用于监测和分享威胁情报。我们观察了F1提取方法的运行情况,并将所有方法的进度比标定了TTP的进度,从而改进了Slumalalal 的进度评估了我们的进度。