关于网络脆弱性的Tweets无人监督的分类和数据挖掘框架 (A Framework for Unsupervised Classificiation and Data Mining of Tweets about Cyber Vulnerabilities) - 专知论文

会员服务 ·

0

MINE · 无监督 · TOOLS · BART · 数据挖掘 ·

2021 年 4 月 23 日

A Framework for Unsupervised Classificiation and Data Mining of Tweets about Cyber Vulnerabilities

翻译：关于网络脆弱性的Tweets无人监督的分类和数据挖掘框架

Kenneth Alperin,Emily Joback,Leslie Shing,Gabe Elkin

Many cyber network defense tools rely on the National Vulnerability Database (NVD) to provide timely information on known vulnerabilities that exist within systems on a given network. However, recent studies have indicated that the NVD is not always up to date, with known vulnerabilities being discussed publicly on social media platforms, like Twitter and Reddit, months before they are published to the NVD. To that end, we present a framework for unsupervised classification to filter tweets for relevance to cyber security. We consider and evaluate two unsupervised machine learning techniques for inclusion in our framework, and show that zero-shot classification using a Bidirectional and Auto-Regressive Transformers (BART) model outperforms the other technique with 83.52% accuracy and a F1 score of 83.88, allowing for accurate filtering of tweets without human intervention or labelled data for training. Additionally, we discuss different insights that can be derived from these cyber-relevant tweets, such as trending topics of tweets and the counts of Twitter mentions for Common Vulnerabilities and Exposures (CVEs), that can be used in an alert or report to augment current NVD-based risk assessment tools.

翻译：许多网络防御工具依靠国家脆弱性数据库(NVD)及时提供有关特定网络系统内已知脆弱性的信息,然而,最近的研究表明,NVD并非总能提供最新信息,在将已知脆弱性发布到NVD之前数月,在Twitter和Reddit等社交媒体平台上公开讨论已知脆弱性。为此,我们提出了一个未经监督的分类框架,以过滤与网络安全有关的推文。我们考虑和评估两种未经监督的机器学习技术,以便纳入我们的框架,并表明使用双向和自动反向变换器(BART)模型的零发分比其他技术高83.52%的精确度和83.88分的F1分,从而可以在没有人类干预或贴标签的培训数据的情况下准确筛选推文。此外,我们讨论了从这些与网络有关的推文中获得的不同见解,例如推文的趋势化专题以及通用Vulneribity和曝光量(CVEVES)的Twitter引用的计数,可用于预警或报告,用于增强当前VVD风险工具。

1

相关内容

MINE

人工智能顶会WSDM2021优秀论文奖(Best Paper Award Runner-Up)出炉

人工智能顶会WSDM2021优秀论文奖(Best Paper Award Runner-Up)出炉

专知会员服务

19+阅读 · 2021年2月19日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

《全球数字经济新图景白皮书》（2019版）发布，85页PDF，中国信息通信研究院主编

《全球数字经济新图景白皮书》（2019版）发布，85页PDF，中国信息通信研究院主编

专知会员服务

33+阅读 · 2019年11月7日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Inference with generalizable classifier predictions

Arxiv

0+阅读 · 2021年6月14日

An Integer Linear Programming Framework for Mining Constraints from Data

Arxiv

0+阅读 · 2021年6月11日

Labeled Data Generation with Inexact Supervision

Arxiv

0+阅读 · 2021年6月8日

Log2NS: Enhancing Deep Learning Based Analysis of Logs With Formal to Prevent Survivorship Bias

Arxiv

0+阅读 · 2021年5月29日

Generation of COVID-19 Chest CT Scan Images using Generative Adversarial Networks

Arxiv

0+阅读 · 2021年5月20日

A Large Visual, Qualitative and Quantitative Dataset of Web Pages

Arxiv

0+阅读 · 2021年5月15日

Self-supervised Learning: Generative or Contrastive

Arxiv

25+阅读 · 2021年3月20日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Baselines and test data for cross-lingual inference

Arxiv

3+阅读 · 2018年3月2日

VIP会员

文章信息

相关主题

相关VIP内容

人工智能顶会WSDM2021优秀论文奖(Best Paper Award Runner-Up)出炉

人工智能顶会WSDM2021优秀论文奖(Best Paper Award Runner-Up)出炉

专知会员服务

19+阅读 · 2021年2月19日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

商业数据分析，39页ppt

商业数据分析，39页ppt

专知会员服务

165+阅读 · 2020年6月2日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

在线变分推断，76页ppt，A Regret Bound for Online Variational Inference

专知会员服务

21+阅读 · 2019年12月2日

《全球数字经济新图景白皮书》（2019版）发布，85页PDF，中国信息通信研究院主编

《全球数字经济新图景白皮书》（2019版）发布，85页PDF，中国信息通信研究院主编

专知会员服务

33+阅读 · 2019年11月7日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】《知识图谱与大语言模型的协同应用》，544页pdf

军事通信系统：安全行动的支柱

《缓解大语言模型（LLMs）幻觉：面向应用的检索增强生成（RAG）、推理与智能体系统综述》

【新书】机器学习系统，2620页pdf

相关资讯

计算机 | 国际会议信息5条

计算机 | 国际会议信息5条

Call4Papers

3+阅读 · 2019年7月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Inference with generalizable classifier predictions

Arxiv

0+阅读 · 2021年6月14日

An Integer Linear Programming Framework for Mining Constraints from Data

Arxiv

0+阅读 · 2021年6月11日

Labeled Data Generation with Inexact Supervision

Arxiv

0+阅读 · 2021年6月8日

Log2NS: Enhancing Deep Learning Based Analysis of Logs With Formal to Prevent Survivorship Bias

Arxiv

0+阅读 · 2021年5月29日

Generation of COVID-19 Chest CT Scan Images using Generative Adversarial Networks

Arxiv

0+阅读 · 2021年5月20日

A Large Visual, Qualitative and Quantitative Dataset of Web Pages

Arxiv

0+阅读 · 2021年5月15日

Self-supervised Learning: Generative or Contrastive

Arxiv

25+阅读 · 2021年3月20日

Data Poisoning Attacks and Defenses to Crowdsourcing Systems

Arxiv

8+阅读 · 2021年2月18日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Baselines and test data for cross-lingual inference

Arxiv

3+阅读 · 2018年3月2日

微信扫码咨询专知VIP会员