Benchmarking anomaly detection approaches for multivariate time series is a challenging task due to a lack of high-quality datasets. Current publicly available datasets are too small, not diverse and feature trivial anomalies, which hinders measurable progress in this research area. We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools that reflects realistic behaviour of an automotive powertrain, including its multivariate, dynamic and variable-state properties. Additionally, our dataset represents a discrete-sequence problem, which remains unaddressed by previously-proposed solutions in literature. To cater for both unsupervised and semi-supervised anomaly detection settings, as well as time series generation and forecasting, we make different versions of the dataset available, where training and test subsets are offered in contaminated and clean versions, depending on the task. We also provide baseline results from a selection of approaches based on deterministic and variational autoencoders, as well as a non-parametric approach. As expected, the baseline experimentation shows that the approaches trained on the semi-supervised version of the dataset outperform their unsupervised counterparts, highlighting a need for approaches more robust to contaminated training data. Furthermore, results show that the threshold used can have a large influence on detection performance, hence more work needs to be invested in methods to find a suitable threshold without the need for labelled data.
翻译:由于缺乏高质量数据集,对多元时间序列异常检测方法进行基准测试是一项具有挑战性的任务。当前公开可用的数据集规模过小、多样性不足且包含的异常过于简单,这阻碍了该研究领域取得可衡量的进展。我们提出一种解决方案:通过最先进的仿真工具生成一个多样化、规模大且非平凡的、反映汽车动力总成真实行为的数据集,该数据集具有多元、动态和可变状态的特性。此外,我们的数据集呈现了一个离散序列问题,这在先前文献提出的解决方案中尚未得到解决。为了适应无监督和半监督异常检测设置,以及时间序列生成和预测任务,我们提供了不同版本的数据集,其中训练和测试子集根据任务需求提供受污染和干净的版本。我们还提供了一系列基于确定性自编码器、变分自编码器以及非参数方法的基线结果。正如预期,基线实验表明,在半监督版本数据集上训练的方法优于其无监督的对应方法,这突显了需要开发对受污染训练数据更具鲁棒性的方法。此外,结果显示所采用的阈值对检测性能有显著影响,因此需要投入更多工作来开发无需标记数据即可找到合适阈值的方法。