The field of time series anomaly detection is constantly advancing, with several methods available, making it a challenge to determine the most appropriate method for a specific domain. The evaluation of these methods is facilitated by the use of metrics, which vary widely in their properties. Despite the existence of new evaluation metrics, there is limited agreement on which metrics are best suited for specific scenarios and domain, and the most commonly used metrics have faced criticism in the literature. This paper provides a comprehensive overview of the metrics used for the evaluation of time series anomaly detection methods, and also defines a taxonomy of these based on how they are calculated. By defining a set of properties for evaluation metrics and a set of specific case studies and experiments, twenty metrics are analyzed and discussed in detail, highlighting the unique suitability of each for specific tasks. Through extensive experimentation and analysis, this paper argues that the choice of evaluation metric must be made with care, taking into account the specific requirements of the task at hand.
翻译:时间序列异常现象的探测领域正在不断推进,有好几种方法可供使用,因此,确定具体领域最适当的方法是一项挑战,这些方法的评价因使用指标而得到便利,这些指标的特性差异很大。尽管有新的评价指标,但对于哪些指标最适合具体情景和领域,而且最常用的衡量标准在文献中受到批评,只有有限的一致意见。本文件全面概述了用于评价时间序列异常现象探测方法的衡量标准,并根据如何计算,界定了这些方法的分类。通过界定一套评价指标的属性以及一套具体的案例研究和实验,对20个指标进行了详细分析和讨论,突出每一种指标对具体任务的独特性。通过广泛的试验和分析,本文件认为,必须谨慎地选择评价指标,同时考虑到手头任务的具体要求。</s>