With the advent of 2-dimensional Convolution Neural Networks (2D CNNs), the face recognition accuracy has reached above 99%. However, face recognition is still a challenge in real world conditions. A video, instead of an image, as an input can be more useful to solve the challenges of face recognition in real world conditions. This is because a video provides more features than an image. However, 2D CNNs cannot take advantage of the temporal features present in the video. We therefore, propose a framework called $Sf_{3}CNN$ for face recognition in videos. The $Sf_{3}CNN$ framework uses 3-dimensional Residual Network (3D Resnet) and A-Softmax loss for face recognition in videos. The use of 3D ResNet helps to capture both spatial and temporal features into one compact feature map. However, the 3D CNN features must be highly discriminative for efficient face recognition. The use of A-Softmax loss helps to extract highly discriminative features from the video for face recognition. $Sf_{3}CNN$ framework gives an increased accuracy of 99.10% on CVBL video database in comparison to the previous 97% on the same database using 3D ResNets.
翻译:2DNNN 框架使用3维残余网络(3D Resnet)和A-Softmax损失来在视频中进行面部识别。使用 3D ResNet 有助于将空间和时间特征都记录在一个紧凑的功能图中。然而,3D CNN 功能对于有效面部识别必须具有高度的差别性。使用 A-Softmax 损失有助于从视频中提取高度歧视性特征供面部识别。 $SfQQQQQQQQ3}3N$CN$ 框架使CVBL 视频数据库中的99.10%的精确度有所提高。