This paper presents the first time series clustering benchmark utilizing all time series datasets currently available in the University of California Riverside (UCR) archive -- the state of the art repository of time series data. Specifically, the benchmark examines eight popular clustering methods representing three categories of clustering algorithms (partitional, hierarchical and density-based) and three types of distance measures (Euclidean, dynamic time warping, and shape-based). We lay out six restrictions with special attention to making the benchmark as unbiased as possible. A phased evaluation approach was then designed for summarizing dataset-level assessment metrics and discussing the results. The benchmark study presented can be a useful reference for the research community on its own; and the dataset-level assessment metrics reported may be used for designing evaluation frameworks to answer different research questions.
翻译:本文件介绍了第一个时间序列群集基准,利用加利福尼亚河滨大学档案中现有的所有时间序列数据集 -- -- 时间序列数据最新储存库。具体地说,基准审查了代表三类群集算法的八种流行群集方法(按部、等级和密度计算)和三类距离计量法(按EClidean、动态时间扭曲和形状计算)。我们提出了六项限制,特别注意使基准尽可能不带偏见。然后设计了一个分阶段评价办法,以总结数据集级评估指标并讨论结果。所提出的基准研究可以作为研究界本身的有用参考;所报告的数据集级评估指标可用于设计评价框架,以解答不同的研究问题。