Recently there has been a growing interest in Transformer not only in NLP but also in computer vision. We wonder if transformer can be used in face recognition and whether it is better than CNNs. Therefore, we investigate the performance of Transformer models in face recognition. Considering the original Transformer may neglect the inter-patch information, we modify the patch generation process and make the tokens with sliding patches which overlaps with each others. The models are trained on CASIA-WebFace and MS-Celeb-1M databases, and evaluated on several mainstream benchmarks, including LFW, SLLFW, CALFW, CPLFW, TALFW, CFP-FP, AGEDB and IJB-C databases. We demonstrate that Face Transformer models trained on a large-scale database, MS-Celeb-1M, achieve comparable performance as CNN with similar number of parameters and MACs. To facilitate further researches, Face Transformer models and codes are available at https://github.com/zhongyy/Face-Transformer.
翻译:最近,人们不仅对NLP而且对计算机视觉都越来越关注变压器。我们想知道变压器能否用于面部识别,是否比CNN更好。因此,我们调查变压器模型的性能,以面部识别。考虑到原来的变压器可能忽略互换信息,我们修改补丁生成过程,用相互重叠的滑动补丁打下标记。这些模型在CASIA-WebFace和MS-Celeb-1M数据库上接受培训,并根据几个主流基准进行评估,包括LFW、SLLFW、CLFW、CPLFW、TALFW、CFP-FD、AGEDB和IJB-C数据库。我们证明,在大型数据库MS-Celeb-1M上培训的面形变压器模型,与CNN具有类似参数和MACS的类似性能。为了便利进一步研究,Face Fealdererverer 模型和代码可在https://github.com/zhyy/Face-Trans-Transforforforfent。