Cyber threat and attack intelligence information are available in non-standard format from heterogeneous sources. Comprehending them and utilizing them for threat intelligence extraction requires engaging security experts. Knowledge graphs enable converting this unstructured information from heterogeneous sources into a structured representation of data and factual knowledge for several downstream tasks such as predicting missing information and future threat trends. Existing large-scale knowledge graphs mainly focus on general classes of entities and relationships between them. Open-source knowledge graphs for the security domain do not exist. To fill this gap, we've built \textsf{TINKER} - a knowledge graph for threat intelligence (\textbf{T}hreat \textbf{IN}telligence \textbf{K}nowl\textbf{E}dge g\textbf{R}aph). \textsf{TINKER} is generated using RDF triples describing entities and relations from tokenized unstructured natural language text from 83 threat reports published between 2006-2021. We built \textsf{TINKER} using classes and properties defined by open-source malware ontology and using hand-annotated RDF triples. We also discuss ongoing research and challenges faced while creating \textsf{TINKER}.
翻译:网络威胁和攻击情报信息以不同来源的非标准格式提供。 要使用这些网络威胁和攻击情报信息来进行威胁情报的提取, 就需要有安全专家的参与。 知识图表能够将这种来自不同来源的无结构信息转换成一系列下游任务的数据和事实知识的结构化代表, 例如预测缺失的信息和未来威胁趋势。 现有的大型知识图表主要侧重于实体的一般类别和它们之间的关系。 安全领域的开放源知识图表不存在。 为了填补这一空白, 我们从2006- 2021年间发表的83份威胁报告中建立了\ textsf{ TINKER} - 一个威胁情报知识图表(\ textbf{Textb{Textb{}INtlegligence\ textb{K}K}K}nowl\ textb{E}ge gtextb{R}ph。 现有的大规模知识图表将主要侧重于安全领域的实体和关系。 从象征性的非结构化自然语言文本中描述实体和关系。 我们用开放源的搜索工具来建立课程和争论我们所面临的挑战。