The $k$-Means algorithm is one of the most popular choices for clustering data but is well-known to be sensitive to the initialization process. There is a substantial number of methods that aim at finding optimal initial seeds for $k$-Means, though none of them are universally valid. This paper presents an extension to longitudinal data of one of such methods, the BRIk algorithm, that relies on clustering a set of centroids derived from bootstrap replicates of the data and on the use of the versatile Modified Band Depth. In our approach we improve the BRIk method by adding a step where we fit appropriate B-splines to our observations and a resampling process that allows computational feasibility and handling issues such as noise or missing data. Our results with simulated and real data sets indicate that our $F$unctional Data $A$pproach to the BRIK method (FABRIk) is more effective than previous proposals at providing seeds to initialize $k$-Means in terms of clustering recovery.
翻译:$k$-Means 算法是分组数据最受欢迎的选择之一,但众所周知,它对于初始化过程十分敏感。有很多方法旨在为$k$-Means寻找最佳初始种子,尽管其中没有任何一种方法具有普遍效力。本文扩展了这类方法之一的纵向数据,即BRIk算法,该算法依赖于将一组从数据复制的靴子和多功能的变频带深度的使用中产生的小行星组合在一起。在我们的方法中,我们改进了BRIk方法,增加了一个步骤,使我们在观察和重采过程中适合B-波纹,从而允许计算可行性和处理噪音或缺失数据等问题。我们模拟和真实数据集的结果表明,我们的美元数据比BRIK方法(FABRIk)的美元元数据元(Ppropach)在提供种子以初始化$k-Means的集束回收方面比以前的建议更为有效。