In this work, we address the problem of large-scale online face clustering: given a continuous stream of unknown faces, create a database grouping the incoming faces by their identity. The database must be updated every time a new face arrives. In addition, the solution must be efficient, accurate and scalable. For this purpose, we present an online gaussian mixture-based clustering method (OGMC). The key idea of this method is the proposal that an identity can be represented by more than just one distribution or cluster. Using feature vectors (f-vectors) extracted from the incoming faces, OGMC generates clusters that may be connected to others depending on their proximity and their robustness. Every time a cluster is updated with a new sample, its connections are also updated. With this approach, we reduce the dependency of the clustering process on the order and the size of the incoming data and we are able to deal with complex data distributions. Experimental results show that the proposed approach outperforms state-of-the-art clustering methods on large-scale face clustering benchmarks not only in accuracy, but also in efficiency and scalability.
翻译:在这项工作中,我们解决了大规模在线面部集群的问题:考虑到不断流的未知面孔,建立一个数据库,按其身份对进取面孔进行分组。每当出现新面孔时,数据库必须更新。此外,解决方案必须是高效、准确和可扩缩的。为此,我们提出了一个在线粗体混合群集方法(OGMC)。这种方法的关键理念是建议一个身份可以代表不止一个分布或群集。使用从进取面孔中提取的特征矢量(f-victors),OGMC生成的群集可以与其他人连接,取决于它们是否接近和坚固。每次以新的样本更新一个群集时,其连接也会得到更新。通过这种方法,我们减少了组合过程对进取数据的顺序和大小的依赖,我们能够处理复杂的数据分布。实验结果显示,拟议的方法不仅准确、而且效率和可扩缩性地反映了大规模面群集基准的状况。