In the field of psychopathology, Ecological Momentary Assessment (EMA) methodological advancements have offered new opportunities to collect time-intensive, repeated and intra-individual measurements. This way, a large amount of data has become available, providing the means for further exploring mental disorders. Consequently, advanced machine learning (ML) methods are needed to understand data characteristics and uncover hidden and meaningful relationships regarding the underlying complex psychological processes. Among other uses, ML facilitates the identification of similar patterns in data of different individuals through clustering. This paper focuses on clustering multivariate time-series (MTS) data of individuals into several groups. Since clustering is an unsupervised problem, it is challenging to assess whether the resulting grouping is successful. Thus, we investigate different clustering methods based on different distance measures and assess them for the stability and quality of the derived clusters. These clustering steps are illustrated on a real-world EMA dataset, including 33 individuals and 15 variables. Through evaluation, the results of kernel-based clustering methods appear promising to identify meaningful groups in the data. So, efficient representations of EMA data play an important role in clustering.
翻译:在精神病学领域,生态动力评估方法的进步为收集时间密集、重复和个人内部测量提供了新的机会,从而产生了大量数据,为进一步探索精神失常提供了手段,因此,需要先进的机器学习方法来了解数据特征,并发现与基本复杂心理过程有关的隐蔽和有意义的关系。除其他用途外,ML还促进通过集群查明不同个人的数据的类似模式。本文侧重于将个人多变时间序列数据分组成若干群体。由于集群是一个无人监督的问题,因此很难评估由此产生的分组是否成功。因此,我们根据不同的远程计量方法调查不同的分组方法,并评估这些分类方法对于衍生组的稳定性和质量。这些组合步骤通过真实的EMA数据集加以说明,其中包括33个个人和15个变量。通过评价,内核集群方法的结果似乎有望在数据中确定有意义的群体。因此,高效地展示EMA数据在集群中起着重要作用。