Mean shift is a simple interactive procedure that gradually shifts data points towards the mode which denotes the highest density of data points in the region. Mean shift algorithms have been effectively used for data denoising, mode seeking, and finding the number of clusters in a dataset in an automated fashion. However, the merits of mean shift quickly fade away as the data dimensions increase and only a handful of features contain useful information about the cluster structure of the data. We propose a simple yet elegant feature-weighted variant of mean shift to efficiently learn the feature importance and thus, extending the merits of mean shift to high-dimensional data. The resulting algorithm not only outperforms the conventional mean shift clustering procedure but also preserves its computational simplicity. In addition, the proposed method comes with rigorous theoretical convergence guarantees and a convergence rate of at least a cubic order. The efficacy of our proposal is thoroughly assessed through experimental comparison against baseline and state-of-the-art clustering methods on synthetic as well as real-world datasets.
翻译:平均转换是一个简单的互动程序,它将数据指向显示该区域数据点密度最高的模式。 平均转移算法已被有效用于数据解密、寻找和自动查找数据集中组群数。然而,随着数据维度的增加,平均转移的优点迅速消失,只有少数特征含有关于数据群集结构的有用信息。我们提出了一个简单而优雅的特质加权平均值转换变量,以有效了解特征重要性,从而将平均转移的优点扩大到高维数据。由此产生的算法不仅优于常规的中度转移群集程序,而且还保持其计算简单性。此外,拟议方法还带有严格的理论趋同保证和至少立方排列的趋同率。我们提案的效力是通过实验性比较合成和真实世界数据集的基线和最新集集法方法进行彻底评估的。