局部评估时间序列 (Local Evaluation of Time Series Anomaly Detection Algorithms)

In recent years, specific evaluation metrics for time series anomaly detection algorithms have been developed to handle the limitations of the classical precision and recall. However, such metrics are heuristically built as an aggregate of multiple desirable aspects, introduce parameters and wipe out the interpretability of the output. In this article, we first highlight the limitations of the classical precision/recall, as well as the main issues of the recent event-based metrics -- for instance, we show that an adversary algorithm can reach high precision and recall on almost any dataset under weak assumption. To cope with the above problems, we propose a theoretically grounded, robust, parameter-free and interpretable extension to precision/recall metrics, based on the concept of ``affiliation'' between the ground truth and the prediction sets. Our metrics leverage measures of duration between ground truth and predictions, and have thus an intuitive interpretation. By further comparison against random sampling, we obtain a normalized precision/recall, quantifying how much a given set of results is better than a random baseline prediction. By construction, our approach keeps the evaluation local regarding ground truth events, enabling fine-grained visualization and interpretation of algorithmic results. We compare our proposal against various public time series anomaly detection datasets, algorithms and metrics. We further derive theoretical properties of the affiliation metrics that give explicit expectations about their behavior and ensure robustness against adversary strategies.

翻译：近年来,针对时间序列异常检测算法制定了具体的评价指标,以处理古典精确度和回顾的局限性。然而,这类指标是建立在理论上的、强有力的、无参数的、可解释的扩展的精确度/召回度量,其基础是多种可取的方面,引入参数并消除产出的可解释性。在本篇文章中,我们首先强调古典精确度/召回的局限性,以及最近基于事件的指标的主要问题 -- -- 例如,我们表明,对立算法可以达到很高的精确度,并在假设薄弱的情况下回顾几乎所有数据集。为了应对上述问题,我们提议在理论基础上、强有力、无参数和可解释的扩展至精确/召回度度量度量度,其依据是 " 与地面真相和预测各组之间的匹配性 " 概念,引入参数,并消除产出的可解释性。我们的指标利用了对地面真相和预测之间的持续时间尺度,从而得出一个不直观的解释。我们通过进一步与随机抽样比较,我们获得了标准化的精确/召回度/召回,量化一套结果比随机基线预测要好多少。我们的方法使当地关于地面真相事件的评价与精确度/召回度,我们对照了对正标度的排序的模型分析结果,我们进一步对比了。

相关内容

异常检测

关注 102

在数据挖掘中，异常检测（英语：anomaly detection）对不符合预期模式或数据集中其他项目的项目、事件或观测值的识别。通常异常项目会转变成银行欺诈、结构缺陷、医疗问题、文本错误等类型的问题。异常也被称为离群值、新奇、噪声、偏差和例外。特别是在检测滥用与网络入侵时，有趣性对象往往不是罕见对象，但却是超出预料的突发活动。这种模式不遵循通常统计定义中把异常点看作是罕见对象，于是许多异常检测方法（特别是无监督的方法）将对此类数据失效，除非进行了合适的聚集。相反，聚类分析算法可能可以检测出这些模式形成的微聚类。有三大类异常检测方法。[1] 在假设数据集中大多数实例都是正常的前提下，无监督异常检测方法能通过寻找与其他数据最不匹配的实例来检测出未标记测试数据的异常。监督式异常检测方法需要一个已经被标记“正常”与“异常”的数据集，并涉及到训练分类器（与许多其他的统计分类问题的关键区别是异常检测的内在不均衡性）。半监督式异常检测方法根据一个给定的正常训练数据集创建一个表示正常行为的模型，然后检测由学习模型生成的测试实例的可能性。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日