网络-对接后果预测 (Cyber-Attack Consequence Prediction)

from arxiv, 9 pages. The pre-print of a paper to appear in the proceedings of the 3rd Workshop on Big Data Engineering and Analytics in Cyber-Physical Systems (BigEACPS'20), IEEE BigData Conference 2020

Cyber-physical systems posit a complex number of security challenges due to interconnection of heterogeneous devices having limited processing, communication, and power capabilities. Additionally, the conglomeration of both physical and cyber-space further makes it difficult to devise a single security plan spanning both these spaces. Cyber-security researchers are often overloaded with a variety of cyber-alerts on a daily basis many of which turn out to be false positives. In this paper, we use machine learning and natural language processing techniques to predict the consequences of cyberattacks. The idea is to enable security researchers to have tools at their disposal that makes it easier to communicate the attack consequences with various stakeholders who may have little to no cybersecurity expertise. Additionally, with the proposed approach researchers' cognitive load can be reduced by automatically predicting the consequences of attacks in case new attacks are discovered. We compare the performance through various machine learning models employing word vectors obtained using both tf-idf and Doc2Vec models. In our experiments, an accuracy of 60% was obtained using tf-idf features and 57% using Doc2Vec method for models based on LinearSVC model.

翻译：网络物理系统由于处理、通信和电力能力有限的多种装置的相互连接而产生了复杂数量的安全挑战。此外,物理和网络空间的结合使得很难设计一个涵盖这两个空间的单一安全计划。网络安全研究人员常常每天超载各种网络警报,其中许多结果证明是虚假的。在本文中,我们使用机器学习和自然语言处理技术来预测网络攻击的后果。目的是让安全研究人员掌握工具,以便更容易地与可能几乎没有网络安全专门知识的各种利益攸关方交流攻击后果。此外,如果发现新的攻击事件,通过自动预测攻击的后果,可以减少拟议方法研究人员的认知负荷。我们通过使用利用tf-idf和Doc2Vec模式获得的文字矢量的各种机器学习模型比较了业绩。在我们的实验中,利用tf-idf特性和57%的DOC2Vec方法对以线形SVC模型为基础的模型进行了精确度,60%的精确度是用tf-idf特性和57%的精确度。

相关内容

TF-IDF

关注 0

TF-IDF（英语：term frequency–inverse document frequency）是一种用于信息检索与文本挖掘的常用加权技术。tf-idf是一种统计方法，用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。字词的重要性随着它在文件中出现的次数成正比增加，但同时会随着它在语料库中出现的频率成反比下降。tf-idf加权的各种形式常被搜索引擎应用，作为文件与用户查询之间相关程度的度量或评级。除了tf-idf以外，互联网上的搜索引擎还会使用基于链接分析的评级方法，以确定文件在搜索结果中出现的顺序。

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日