Face clustering is a promising way to scale up face recognition systems using large-scale unlabeled face images. It remains challenging to identify small or sparse face image clusters that we call hard clusters, which is caused by the heterogeneity, \ie, high variations in size and sparsity, of the clusters. Consequently, the conventional way of using a uniform threshold (to identify clusters) often leads to a terrible misclassification for the samples that should belong to hard clusters. We tackle this problem by leveraging the neighborhood information of samples and inferring the cluster memberships (of samples) in a probabilistic way. We introduce two novel modules, Neighborhood-Diffusion-based Density (NDDe) and Transition-Probability-based Distance (TPDi), based on which we can simply apply the standard Density Peak Clustering algorithm with a uniform threshold. Our experiments on multiple benchmarks show that each module contributes to the final performance of our method, and by incorporating them into other advanced face clustering methods, these two modules can boost the performance of these methods to a new state-of-the-art. Code is available at: https://github.com/echoanran/On-Mitigating-Hard-Clusters.
翻译:面团是利用大型无标签面部图像扩大面部识别系统的有希望的方法。 确定我们称之为硬团群的小型或稀薄面部图像群集仍然是个挑战。 硬团群是由各组群的异质性、\ie、大小和宽度差异很大造成的。 因此,使用统一门槛(识别群集)的传统方式往往会导致属于硬团群体样本的可怕分类错误。 我们通过利用样本的周边信息和以概率方式推断组群(样本)成员来解决这一问题。 我们引入了两个新型模块,即以邻里-发集为基础的密度(NDDe)和基于过渡-概率为基础的距离(TPDi),我们可以基于这两个模块简单应用标准的Density Peak群集算法和统一门槛。 我们在多个基准上的实验显示,每个模块都有助于我们方法的最终性能,并将它们纳入其他高级面团集方法中。 这两个模块可以将这些方法的性能提升到一个新的状态- 艺术。 代码可以提供: http://chob- Harlub- mast- organ- 。