The objective of this paper is to learn a compact representation of image sets for template-based face recognition. We make the following contributions: first, we propose a network architecture which aggregates and embeds the face descriptors produced by deep convolutional neural networks into a compact fixed-length representation. This compact representation requires minimal memory storage and enables efficient similarity computation. Second, we propose a novel GhostVLAD layer that includes {\em ghost clusters}, that do not contribute to the aggregation. We show that a quality weighting on the input faces emerges automatically such that informative images contribute more than those with low quality, and that the ghost clusters enhance the network's ability to deal with poor quality images. Third, we explore how input feature dimension, number of clusters and different training techniques affect the recognition performance. Given this analysis, we train a network that far exceeds the state-of-the-art on the IJB-B face recognition dataset. This is currently one of the most challenging public benchmarks, and we surpass the state-of-the-art on both the identification and verification protocols.
翻译:本文的目的是学习基于模板的面部识别图像集的缩略图。 我们做出以下贡献: 首先, 我们提出一个网络结构, 将深层进化神经网络生成的面部描述器集中并嵌入一个紧凑的固定长度代表器。 这个缩略图需要最小的存储存储, 并允许高效的相似性计算 。 其次, 我们提出一个新的GhostVLAD 层, 其中包括 ~ 幽灵群集 }, 这不会对聚合作出贡献 。 我们显示, 输入面的质量加权会自动出现, 这样, 信息化图像比低质量的图像贡献更大, 幽灵群会增强网络处理低质量图像的能力 。 第三, 我们探索输入特征、 组群和不同培训技术如何影响识别性。 根据这项分析, 我们培训一个远远超过IJB- B 面部面部识别数据集的状态的网络。 这是目前最具挑战的公共基准之一, 我们超越了身份与核查协议的状态 。