We investigate a novel non-parametric regression-based clustering algorithm for longitudinal data analysis. Combining natural cubic splines with Gaussian mixture models (GMM), the algorithm can produce smooth cluster means that describe the underlying data well. However, there are some shortcomings in the algorithm: high computational complexity in the parameter estimation procedure and a numerically unstable variance estimator. Therefore, to further increase the usability of the method, we incorporated approaches to reduce its computational complexity, we developed a new, more stable variance estimator, and we developed a new smoothing parameter estimation procedure. We show that the developed algorithm, SMIXS, performs better than GMM on a synthetic dataset in terms of clustering and regression performance. We demonstrate the impact of the computational speed-ups, which we formally prove in the new framework. Finally, we perform a case study by using SMIXS to cluster vertical atmospheric measurements to determine different weather regimes.
翻译:我们为纵向数据分析调查一种新的非参数回归群集算法。 将自然立方样样与高西亚混合模型( GMM)相结合, 算法可以产生光滑的组群, 来描述基础数据。 但是, 算法中存在一些缺点: 参数估计程序的计算复杂程度高, 以及数字不稳定的差异估计仪。 因此, 为了进一步提高该方法的可用性, 我们采用了降低其计算复杂性的方法, 我们开发了一个新的、 更稳定的差异估计仪, 我们开发了一个新的平滑参数估计程序。 我们显示, 开发的算法( SMIXS) 在合成数据集中, 在组群和回归性能方面的表现优于 GMM。 我们展示了计算速度的影响, 我们在新的框架中正式证明了这一点。 最后, 我们通过使用 SMIXS 来分组垂直大气测量来确定不同的天气制度, 进行了一项案例研究。