Content generation that is both relevant and up to date with the current threats of the target audience is a critical element in the success of any Cyber Security Exercise (CSE). Through this work, we explore the results of applying machine learning techniques to unstructured information sources to generate structured CSE content. The corpus of our work is a large dataset of publicly available cyber security articles that have been used to predict future threats and to form the skeleton for new exercise scenarios. Machine learning techniques, like named entity recognition (NER) and topic extraction, have been utilised to structure the information based on a novel ontology we developed, named Cyber Exercise Scenario Ontology (CESO). Moreover, we used clustering with outliers to classify the generated extracted data into objects of our ontology. Graph comparison methodologies were used to match generated scenario fragments to known threat actors' tactics and help enrich the proposed scenario accordingly with the help of synthetic text generators. CESO has also been chosen as the prominent way to express both fragments and the final proposed scenario content by our AI-assisted Cyber Exercise Framework (AiCEF). Our methodology was put to test by providing a set of generated scenarios for evaluation to a group of experts to be used as part of a real-world awareness tabletop exercise.
翻译:通过这项工作,我们探索了将机器学习技术应用于非结构化信息源以生成结构化计算机安全内容的结果。我们的工作内容是大量公开可用的网络安全文章数据集,这些文章被用来预测未来威胁和形成新的练习情景的骨架。计算机学习技术,如名称实体识别(NER)和专题提取,已经用来根据我们开发的新颖的网络安全场景(CESO)构建信息结构。此外,我们利用外联网对生成的数据进行分类,将生成的数据分类为我们的本体目标。我们使用图表比较方法,将生成的情景碎片与已知的威胁行为者的策略相匹配,并在合成文本生成者的帮助下,帮助相应丰富拟议的情景。计算机应用中心还被选为表明碎片和我们国际协助的网络操作框架最后拟议情景内容的突出方法。我们的方法是通过提供一套真实生成的情景认识,作为评估的一部分,用来测试一个专家小组使用。