知识挖掘非结构化信息:对网络域域的应用 (Knowledge mining of unstructured information: application to cyber-domain)

Cyber intelligence is widely and abundantly available in numerous open online sources with reports on vulnerabilities and incidents. This constant stream of noisy information requires new tools and techniques if it is to be used for the benefit of analysts and investigators in various organizations. In this paper we present and implement a novel knowledge graph and knowledge mining framework for extracting relevant information from free-form text about incidents in the cyber domain. Our framework includes a machine learning based pipeline as well as crawling methods for generating graphs of entities, attackers and the related information with our non-technical cyber ontology. We test our framework on publicly available cyber incident datasets to evaluate the accuracy of our knowledge mining methods as well as the usefulness of the framework in the use of cyber analysts. Our results show analyzing the knowledge graph constructed using the novel framework, an analyst can infer additional information from the current cyber landscape in terms of risk to various entities and the propagation of risk between industries and countries. Expanding the framework to accommodate more technical and operational level information can increase the accuracy and explainability of trends and risk in the knowledge graph.

翻译：许多公开的在线来源广泛和大量提供网络情报,并有关于脆弱性和事件的报告。这种不断涌现的噪音信息流需要新的工具和技术,才能用于各组织的分析员和调查员。在本文件中,我们提出并实施一个新的知识图表和知识采矿框架,从网络领域事件自由文本中提取相关信息。我们的框架包括基于机器的学习管道以及利用我们非技术网络本体生成实体、攻击者和相关信息图的爬动方法。我们测试我们关于公开提供的网络事件数据集的框架,以评估我们知识采矿方法的准确性以及网络分析员使用框架的有用性。我们的结果显示,利用新框架分析知识图,分析员可以从当前网络环境中推断更多关于各实体面临的风险的信息,以及工业和国家之间风险传播的信息。扩大框架以适应更多技术和业务层面的信息,可以提高知识图中的趋势和风险的准确性和可解释性。