We introduce VISTA, a clustering approach for multivariate and irregularly sampled time series based on a parametric state space mixture model. VISTA is specifically designed for the unsupervised identification of groups in datasets originating from healthcare and psychology where such sampling issues are commonplace. Our approach adapts linear Gaussian state space models (LGSSMs) to provide a flexible parametric framework for fitting a wide range of time series dynamics. The clustering approach itself is based on the assumption that the population can be represented as a mixture of a given number of LGSSMs. VISTA's model formulation allows for an explicit derivation of the log-likelihood function, from which we develop an expectation-maximization scheme for fitting model parameters to the observed data samples. Our algorithmic implementation is designed to handle populations of multivariate time series that can exhibit large changes in sampling rate as well as irregular sampling. We evaluate the versatility and accuracy of our approach on simulated and real-world datasets, including demographic trends, wearable sensor data, epidemiological time series, and ecological momentary assessments. Our results indicate that VISTA outperforms most comparable standard times series clustering methods. We provide an open-source implementation of VISTA in Python.
翻译:暂无翻译