Recent research in clustering face embeddings has found that unsupervised, shallow, heuristic-based methods -- including $k$-means and hierarchical agglomerative clustering -- underperform supervised, deep, inductive methods. While the reported improvements are indeed impressive, experiments are mostly limited to face datasets, where the clustered embeddings are highly discriminative or well-separated by class (Recall@1 above 90% and often nearing ceiling), and the experimental methodology seemingly favors the deep methods. We conduct a large-scale empirical study of 17 clustering methods across three datasets and obtain several robust findings. Notably, deep methods are surprisingly fragile for embeddings with more uncertainty, where they match or even perform worse than shallow, heuristic-based methods. When embeddings are highly discriminative, deep methods do outperform the baselines, consistent with past results, but the margin between methods is much smaller than previously reported. We believe our benchmarks broaden the scope of supervised clustering methods beyond the face domain and can serve as a foundation on which these methods could be improved. To enable reproducibility, we include all necessary details in the appendices, and plan to release the code.
翻译:最近关于集群的研究结果发现,最近的集群研究发现,未经监督的、浅浅的、基于超光速的方法 -- -- 包括$k$手段和等级聚集群 -- -- 低于监督的、深度的、感性的方法。虽然所报告的改进确实令人印象深刻,但实验大多限于面对数据集,即集群嵌入高度有差别或按等级(回调@1以上90%,且往往接近上限)分开,实验方法似乎有利于深层方法。我们进行了大规模的经验性研究,对三个数据集的17种集成方法进行了大规模研究,并取得了一些强有力的发现。值得注意的是,深层方法对不确定性较大的嵌入方法极为脆弱,其匹配甚至表现比浅度、基于超光度的方法更差。当嵌入高度具有歧视性时,深层方法比基线要差得多,与过去的结果一致,但方法之间的距离比以前所报告的要小得多。我们认为我们的基准扩大了监督组合方法的范围,超越了面部域,并且可以作为改进这些方法的基础。为了便于重新解读,我们把所有必要的细节都纳入到代码中。