We consider the problem of fast time-series data clustering. Building on previous work modeling the correlation-based Hamiltonian of spin variables we present an updated fast non-expensive Agglomerative Likelihood Clustering algorithm (ALC). The method replaces the optimized genetic algorithm based approach (f-SPC) with an agglomerative recursive merging framework inspired by previous work in Econophysics and Community Detection. The method is tested on noisy synthetic correlated time-series data-sets with built-in cluster structure to demonstrate that the algorithm produces meaningful non-trivial results. We apply it to time-series data-sets as large as 20,000 assets and we argue that ALC can reduce compute time costs and resource usage cost for large scale clustering for time-series applications while being serialized, and hence has no obvious parallelization requirement. The algorithm can be an effective choice for state-detection for online learning in a fast non-linear data environment because the algorithm requires no prior information about the number of clusters.
翻译:我们考虑的是快速时间序列数据集群问题。基于先前的工作模型,我们提出了一个基于相关因素的旋转变量汉密尔顿模型(Hamilton),我们提出了一个更新的快速非昂贵集聚类集算法(ALC ) 。该方法用一个基于生态物理和社区探测的先前工作所启发的集合性循环合并框架来取代基于最佳遗传算法(f-SPC ) 。该方法在以内嵌集结构为主的噪音合成时间序列数据集中测试,以证明该算法产生了有意义的非三角结果。我们将其应用到20,000个资产这样的时间序列数据集中,我们争辩说,ALC可以降低时间序列应用大规模集集集的时间计算成本和资源使用成本,同时进行序列化,因此没有明显的平行化要求。该算法可以有效地选择在一个快速的非线性数据环境中进行在线学习的国家探测,因为算法不需要关于集群数目的事先信息。