We present an approach to clustering time series data using a model-based generalization of the K-Means algorithm which we call K-Models. We prove the convergence of this general algorithm and relate it to the hard-EM algorithm for mixture modeling. We then apply our method first with an AR($p$) clustering example and show how the clustering algorithm can be made robust to outliers using a least-absolute deviations criteria. We then build our clustering algorithm up for ARMA($p,q$) models and extend this to ARIMA($p,d,q$) models. We develop a goodness of fit statistic for the models fitted to clusters based on the Ljung-Box statistic. We perform experiments with simulated data to show how the algorithm can be used for outlier detection, detecting distributional drift, and discuss the impact of initialization method on empty clusters. We also perform experiments on real data which show that our method is competitive with other existing methods for similar time series clustering tasks.
翻译:我们提出一种方法,利用基于模型的K-MEans算法(我们称之为K-Models)集成时间序列数据,我们称之为K-Models。我们证明这种通用算法的趋同性,并将其与混合模型的硬分子-EM算法联系起来。然后我们首先应用我们的方法,先用AR($p$)组集示例,并展示如何利用最小绝对偏差标准使组合算法对外端数据进行强力。然后,我们为ARMA($p,q$)模型建立我们的群集算法,然后将其推广到ARIMA($,d,q$)模型。我们用模拟数据对适合基于 Ljung-Box 统计的组集的模型进行精确统计。我们用模拟数据进行实验,以显示该算法如何用于外部检测、探测分布流,并讨论初始化方法对空区组的影响。我们还对真实数据进行了实验,这些实验表明我们的方法与其他现有方法在类似时间序列群集任务上具有竞争力。