We introduce a density-based clustering method called skeleton clustering that can detect clusters in multivariate and even high-dimensional data with irregular shapes. To bypass the curse of dimensionality, we propose surrogate density measures that are less dependent on the dimension but have intuitive geometric interpretations. The clustering framework constructs a concise representation of the given data as an intermediate step and can be thought of as a combination of prototype methods, density-based clustering, and hierarchical clustering. We show by theoretical analysis and empirical studies that the skeleton clustering leads to reliable clusters in multivariate and high-dimensional scenarios.
翻译:我们引入了一种以密度为基础的集群方法,称为骨质集群,它可以探测多变数据中的集群,甚至非正常形状的高维数据中的集群。为了绕过维度的诅咒,我们建议采用不那么依赖维度但具有直观几何解释的代用密度计量方法。 集群框架将特定数据的简明表述作为中间步骤,可以被视为原型方法、基于密度的集群和等级组合的组合。 我们通过理论分析和经验研究表明,骨质集群在多变和高维情景中导致可靠的集群。