Finding a suitable density function is essential for density-based clustering algorithms such as DBSCAN and DPC. A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in these algorithms. Such density suffers from capturing local features in complex datasets. To tackle this issue, we propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness. Furthermore, we develop a surrogate that can be efficiently computed in linear time and space and prove that it is asymptotically equivalent to the kernel diffusion density function. Extensive empirical experiments on benchmark and large-scale face image datasets show that the proposed approach not only achieves a significant improvement over classic density-based clustering algorithms but also outperforms the state-of-the-art face clustering methods by a large margin.
翻译:找到合适的密度函数对于基于密度的集群算法,如DBSCAN和DPC等,是找到合适的密度函数的关键。 这些算法中通常使用与单位单位的指数函数相对应的天真密度。这种密度因在复杂的数据集中捕捉当地特征而受到影响。为了解决这一问题,我们提议一种新的内核扩散密度函数,该功能可适应不同地方分布特性和光滑度的数据。此外,我们开发一种代金体,可以在线性时间和空间中有效计算,并证明它与内核扩散密度函数无异。关于基准和大比例图像数据集的广泛实验表明,拟议的方法不仅在传统的基于密度的组合算法上取得了显著的改进,而且大大超越了最先进的面部组合法。