网络安全中的异常探测:在逆向环境中无人监督、基于图表和受监督的学习方法 (Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and Supervised Learning Methods in Adversarial Environments)

Machine learning for anomaly detection has become a widely researched field in cybersecurity. Inherent to today's operating environment is the practice of adversarial machine learning, which attempts to circumvent machine learning models. In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection in the network intrusion detection system setting, as well as leverage an ensemble approach to supervised learning of the anomaly detection problem. We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments. Our results indicate that the unsupervised and graph-based methods were outperformed in detecting anomalies (malicious activity) by the supervised stacking ensemble method with two levels. This model consists of three different classifiers in the first level, followed by either a Naive Bayes or Decision Tree classifier for the second level. We see that our model maintains an F1-score above 0.97 for malicious samples across all tested level two classifiers. Notably, Naive Bayes is the fastest level two classifier averaging 1.12 seconds while Decision Tree maintains the highest AUC score of 0.98.

翻译：在网络安全方面,异常现象检测机器学习已成为一个广泛研究的领域。对于今天的运作环境来说,隐含着对抗性机器学习的做法,它试图绕过机器学习模式。在这项工作中,我们研究在网络入侵探测系统设置中以不受监督的学习和图表为基础的异常现象检测方法的可行性,以及利用共同的方法监督地了解异常现象检测问题。我们在培训我们所监督的模型时采用现实的对抗性培训机制,以便能够在敌对环境中进行强有力的分类工作。我们的结果表明,未经监督和基于图表的方法在以两个层次监督的堆叠混合方法探测异常(恶意活动)方面表现得胜于在两级监督的异常(恶意活动)中。这一模式由第一级的三个不同的分类者组成,其次是甲湾或决定树分类者,其次是第二级。我们看到,我们的模式在所有测试的二级分类中,恶意样品的F1-核心高于0.97。值得注意的是,纳米贝斯是平均1.12秒的最快的2级分类者,而决定树保持最高的AUC分数为0.98。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

图挖掘与多关系学习，亚马逊与CMU-WWW2021教程，附161页ppt

专知会员服务

37+阅读 · 2021年4月20日

【论文|迁移自适应学习综述】Transfer Adaptation Learning: A Decade Survey

专知会员服务

45+阅读 · 2019年11月26日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日