Open source intelligence is a powerful tool for cybersecurity analysts to gather information both for analysis of discovered vulnerabilities and for detecting novel cybersecurity threats and exploits. However the scale of information that is relevant for information security on the internet is always increasing, and is intractable for analysts to parse comprehensively. Therefore methods of condensing the available open source intelligence, and automatically developing connections between disparate sources of information, is incredibly valuable. In this research, we present a system which constructs a Neo4j graph database formed by shared connections between open source intelligence text including blogs, cybersecurity bulletins, news sites, antivirus scans, social media posts (e.g., Reddit and Twitter), and threat reports. These connections are comprised of possible indicators of compromise (e.g., IP addresses, domains, hashes, email addresses, phone numbers), information on known exploits and techniques (e.g., CVEs and MITRE ATT&CK Technique ID's), and potential sources of information on cybersecurity exploits such as twitter usernames. The construction of the database of potential IoCs is detailed, including the addition of machine learning and metadata which can be used for filtering of the data for a specific domain (for example a specific natural language) when needed. Examples of utilizing the graph database for querying connections between known malicious IoCs and open source intelligence documents, including threat reports, are shown. We show that this type of relationship querying can allow for more effective use of open source intelligence for threat hunting, malware family clustering, and vulnerability analysis.
翻译:开放源码情报是网络安全分析者收集信息以分析发现的脆弱性并发现新的网络安全威胁和剥削的有力工具。然而,互联网信息安全相关信息的规模却在不断增加,分析者难以全面分析。因此,现有开放源码情报的凝固方法以及不同信息来源之间自动建立联系是极其宝贵的。在这项研究中,我们提出了一个系统,用于构建一个Neo4j图表数据库,该数据库由公开源码(包括博客、网络安全公告、新闻网站、抗病毒扫描、社交媒体日志(如Reddit和Twitter)和威胁分析报告等公开源码之间的共享连接而形成。这些连接包括可能的妥协指标(如IP地址、域名、仓储、电子邮件地址、电话号码)、已知利用不同信息来源(如Cves和MITRET&C TAT&C Technique ID),以及网络安全开发的潜在信息来源(如Twitter用户名等)。 建立潜在的IoC数据库是详细的,其中包括在使用特定威胁性数据库中使用的自然域域网格关系,包括用于特定数据格式的链接。