Many cyber network defense tools rely on the National Vulnerability Database (NVD) to provide timely information on known vulnerabilities that exist within systems on a given network. However, recent studies have indicated that the NVD is not always up to date, with known vulnerabilities being discussed publicly on social media platforms, like Twitter and Reddit, months before they are published to the NVD. To that end, we present a framework for unsupervised classification to filter tweets for relevance to cyber security. We consider and evaluate two unsupervised machine learning techniques for inclusion in our framework, and show that zero-shot classification using a Bidirectional and Auto-Regressive Transformers (BART) model outperforms the other technique with 83.52% accuracy and a F1 score of 83.88, allowing for accurate filtering of tweets without human intervention or labelled data for training. Additionally, we discuss different insights that can be derived from these cyber-relevant tweets, such as trending topics of tweets and the counts of Twitter mentions for Common Vulnerabilities and Exposures (CVEs), that can be used in an alert or report to augment current NVD-based risk assessment tools.
翻译:许多网络防御工具依靠国家脆弱性数据库(NVD)及时提供有关特定网络系统内已知脆弱性的信息,然而,最近的研究表明,NVD并非总能提供最新信息,在将已知脆弱性发布到NVD之前数月,在Twitter和Reddit等社交媒体平台上公开讨论已知脆弱性。为此,我们提出了一个未经监督的分类框架,以过滤与网络安全有关的推文。我们考虑和评估两种未经监督的机器学习技术,以便纳入我们的框架,并表明使用双向和自动反向变换器(BART)模型的零发分比其他技术高83.52%的精确度和83.88分的F1分,从而可以在没有人类干预或贴标签的培训数据的情况下准确筛选推文。此外,我们讨论了从这些与网络有关的推文中获得的不同见解,例如推文的趋势化专题以及通用Vulneribity和曝光量(CVEVES)的Twitter引用的计数,可用于预警或报告,用于增强当前VVD风险工具。