Two important nonparametric approaches to clustering emerged in the 1970's: clustering by level sets or cluster tree as proposed by Hartigan, and clustering by gradient lines or gradient flow as proposed by Fukunaga and Hosteler. In a recent paper, we argue the thesis that these two approaches are fundamentally the same by showing that the gradient flow provides a way to move along the cluster tree. In making a stronger case, we are confronted with the fact the cluster tree does not define a partition of the entire support of the underlying density, while the gradient flow does. In the present paper, we resolve this conundrum by proposing two ways of obtaining a partition from the cluster tree -- each one of them very natural in its own right -- and showing that both of them reduce to the partition given by the gradient flow under standard assumptions on the sampling density.
翻译:1970年代出现了两种重要的非对称的集群办法:按照Hartigan的建议,按水平组或组群树分组,按照Fukunaga和Hosteler的建议,按梯度线或梯度流分组。在最近的一份论文中,我们争辩说,这两种办法基本相同,表明梯度流为沿着集群树移动提供了一条途径。在提出更有力的理由时,我们面对的事实是,集群树没有界定对底层密度的整个支持的分割,而梯度流则如此。在本文件中,我们通过提出从集群树获得分流的两种方法来解决这一难题 -- -- 每种方法本身都是非常自然的 -- -- 并表明这两种方法都在抽样密度的标准假设下缩小了梯度流给的分流。