A novel methodology is proposed for clustering multivariate time series data using energy distance defined in Sz\'ekely and Rizzo (2013). Specifically, a dissimilarity matrix is formed using the energy distance statistic to measure separation between the finite dimensional distributions for the component time series. Once the pairwise dissimilarity matrix is calculated, a hierarchical clustering method is then applied to obtain the dendrogram. This procedure is completely nonparametric as the dissimilarities between stationary distributions are directly calculated without making any model assumptions. In order to justify this procedure, asymptotic properties of the energy distance estimates are derived for general stationary and ergodic time series. The method is illustrated in a simulation study for various component time series that are either linear or nonlinear. Finally the methodology is applied to two examples; one involves GDP of selected countries and the other is population size of various states in the U.S.A. in the years 1900 -1999.
翻译:本文提出了一种通过使用 Székely 和 Rizzo (2013) 的能量距离测量来聚类多元时间序列数据的新方法。具体而言,形成一个无限维度分布的组件时间序列之间的分离度量的能量距离统计量来构建一个非相似性矩阵。一旦计算了成对的非相似性矩阵,将应用分层聚类方法来获得谱系图。此过程是完全非参数化的,因为直接计算平稳分布之间的差异,而无需进行任何模型假设。为了证明这个过程,对一般的平稳和遍历时间序列的能量距离估计的渐近性质进行了推导。该方法在各种线性或非线性组成时间序列的模拟研究中进行了演示。最后将该方法应用于两个实例;一个涉及所选国家的国内生产总值(GDP),另一个涉及美国各州的人口规模,时间跨度为 1900 年至 1999 年。