MAD-GAN: 利用生成反向网络对时间序列数据进行多变量异常探测 (MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks)

The prevalence of networked sensors and actuators in many real-world systems such as smart buildings, factories, power plants, and data centers generate substantial amounts of multivariate time series data for these systems. The rich sensor data can be continuously monitored for intrusion events through anomaly detection. However, conventional threshold-based anomaly detection methods are inadequate due to the dynamic complexities of these systems, while supervised machine learning methods are unable to exploit the large amounts of data due to the lack of labeled data. On the other hand, current unsupervised machine learning approaches have not fully exploited the spatial-temporal correlation and other dependencies amongst the multiple variables (sensors/actuators) in the system for detecting anomalies. In this work, we propose an unsupervised multivariate anomaly detection method based on Generative Adversarial Networks (GANs). Instead of treating each data stream independently, our proposed MAD-GAN framework considers the entire variable set concurrently to capture the latent interactions amongst the variables. We also fully exploit both the generator and discriminator produced by the GAN, using a novel anomaly score called DR-score to detect anomalies by discrimination and reconstruction. We have tested our proposed MAD-GAN using two recent datasets collected from real-world CPS: the Secure Water Treatment (SWaT) and the Water Distribution (WADI) datasets. Our experimental results showed that the proposed MAD-GAN is effective in reporting anomalies caused by various cyber-intrusions compared in these complex real-world systems.

翻译：在智能建筑、工厂、发电厂、电厂和数据中心等许多现实世界系统中,网络传感器和触发器的普及性在许多网络性系统中为这些系统生成了大量多变时间序列数据。丰富的传感器数据可以通过异常探测对入侵事件进行持续监测。然而,由于这些系统的动态复杂性,基于门槛的常规异常探测方法不够充分,而由于缺乏标签数据,监督的机器学习方法无法利用大量数据。另一方面,目前未经监督的机器学习方法没有充分利用这些系统的多种变异(传感器/动作器)之间的空间-时际相关性和其他依赖性。在这项工作中,我们建议采用一种基于Generation Adversarial 网络(GANs)的不受监督的多变异性探测方法。我们提议的MAD-G-G-GAN框架没有独立处理每个数据流,而是同时考虑各种变数组,以捕捉变量之间的潜在相互作用。我们还充分利用GAN生成的生成的发电机和导体系统,使用被称为新变异性分数分数的系统(传感器/动作器/动作器),我们提议的MAD-D-D-D-DS-S-MAS-S-S-S-S-S-S-S-SDMS-S-S-S-S-S-S-SDMW 正在测试的最近测试了我们拟议的SMADM-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SMA-S-S-SD-S-S-SD-SD-SD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SD-S-S-SD-SD-SD-SD-S-S-SD-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。