To adapt to a constantly evolving landscape of cyber threats, organizations actively need to collect Indicators of Compromise (IOCs), i.e., forensic artifacts that signal that a host or network might have been compromised. IOCs can be collected through open-source and commercial structured IOC feeds. But, they can also be extracted from a myriad of unstructured threat reports written in natural language and distributed using a wide array of sources such as blogs and social media. There exist multiple indicator extraction tools that can identify IOCs in natural language reports. But, it is hard to compare their accuracy due to the difficulty of building large ground truth datasets. This work presents a novel majority vote methodology for comparing the accuracy of indicator extraction tools, which does not require a manually-built ground truth. We implement our methodology into GoodFATR, an automated platform for collecting threat reports from a wealth of sources, extracting IOCs from the collected reports using multiple tools, and comparing their accuracy. GoodFATR supports 6 threat report sources: RSS, Twitter, Telegram, Malpedia, APTnotes, and ChainSmith. GoodFATR continuously monitors the sources, downloads new threat reports, extracts 41 indicator types from the collected reports, and filters non-malicious indicators to output the IOCs. We run GoodFATR over 15 months to collect 472,891 reports from the 6 sources; extract 978,151 indicators from the reports; and identify 618,217 IOCs. We analyze the collected data to identify the top IOC contributors and the IOC class distribution. We apply GoodFATR to compare the IOC extraction accuracy of 7 popular open-source tools with GoodFATR's own indicator extraction module.
 翻译:为了适应不断演变的网络威胁环境,各组织积极需要收集折合指标(IOCs),即表明主机或网络可能受到损害的法证工艺品;国际奥委会可以通过开放源码和商业结构化的海委会资料收集;但是,它们也可以从以自然语言编写并使用多种来源(如博客和社会媒体)分发的大量无结构的威胁报告中提取;有多种指标提取工具可以在自然语言报告中识别国际奥委会;但由于难以建立大型地面真相数据集,很难比较其准确性;这项工作提出了一种新型多数投票方法,用以比较指标提取工具的准确性,而无需人工建立地面真相;国际奥委会可以将我们的方法纳入GoodFATTR,这是一个从丰富的来源收集威胁报告、利用多种工具从收集到的报告中提取海委会报告并比较其准确性;国际奥委会支持6个威胁报告来源:RSE、Twitter、Telegram、Malpedia、ATToints 以及链SmithSmith。</s>