Clustering is an unsupervised learning technique that is useful when working with a large volume of unlabeled data. Complex dynamical systems in real life often entail data streaming from a large number of sources. Although it is desirable to use all source variables to form accurate state estimates, it is often impractical due to large computational power requirements, and sufficiently robust algorithms to handle these cases are not common. We propose a hierarchical time series clustering technique based on symbolic dynamic filtering and Granger causality, which serves as a dimensionality reduction and noise-rejection tool. Our process forms a hierarchy of variables in the multivariate time series with clustering of relevant variables at each level, thus separating out noise and less relevant variables. A new distance metric based on Granger causality is proposed and used for the time series clustering, as well as validated on empirical data sets. Experimental results from occupancy detection and building temperature estimation tasks show fidelity to the empirical data sets while maintaining state-prediction accuracy with substantially reduced data dimensionality.
翻译:在使用大量未贴标签的数据时,分组是一种不受监督的学习技术,在使用大量未贴标签的数据时是有用的。 现实生活中复杂的动态系统往往需要大量来源的数据流。 虽然使用所有源变量来形成准确的状态估计是可取的,但由于计算能力要求庞大,因此往往不切实际,而且处理这些案件的足够有力的算法并不常见。 我们建议采用基于象征性动态过滤和引力因果的等级时间序列组合技术,作为维度减少和噪音拒绝的工具。 我们的过程在多变时间序列中形成变量的等级,将相关变量组合在每一级别,从而将噪音和不太相关的变量分离出来。 提议了以 " 重因果性 " 为基础的新的距离计量,用于时间序列组合,并经过经验数据集的验证。 占用探测和构建温度估计任务的实验结果显示对经验数据集的忠诚性,同时保持数据维度的准确度,同时大幅度降低数据维度。