Finite mixture models that allow for a broad range of potentially non-elliptical cluster distributions is an emerging methodological field. Such methods allow for the shape of the clusters to match the natural heterogeneity of the data, rather than forcing a series of elliptical clusters. These methods are highly relevant for clustering continuous non-normal data - a common occurrence with objective data that are now routinely captured in health research. However, interpreting and comparing such models - especially with regards to whether they produce meaningful clusters that are reasonably well separated - is non-trivial. We summarize several measures that can succinctly quantify the multivariate distance between two clusters, regardless of the cluster distribution, and suggest practical computational tools. Through a simulation study, we evaluate these measures across three scenarios that allow for clusters to differ in mean, scale, and rotation. We then demonstrate our approaches using physiological responses to emotional imagery captured as part of the Transdiagnostic Anxiety Study, a large-scale study of anxiety disorder spectrum patients and control participants. Finally, we synthesize findings to provide guidance on how to use distance measures in clustering applications.
翻译:允许广泛的潜在非螺旋型集束分布的极量混合模型是一个新出现的方法领域。这些方法使得集群的形状能够与数据的自然异质相匹配,而不是迫使一系列椭圆组。这些方法对于连续的非正常数据集群关系极大,这是常见现象,其客观数据现已在健康研究中例行收集。然而,解释和比较这些模型――特别是它们是否产生有意义的、合理分离的群集――是非三重性的。我们总结了能够简洁地量化两个组群之间多变量距离的若干措施,而不论组群分布如何,并提出了实用的计算工具。我们通过模拟研究,评估了这三种设想方案,使组群在平均值、比例和旋转方面有所差异。然后,我们展示了我们如何使用对作为横跨诊断性焦虑波谱系病人和控制参与者的大规模研究的一部分所捕捉到的情感图像作出生理反应的方法。最后,我们综合了有关结果,以指导如何在集群应用中使用远程措施。