High-quality face images are required to guarantee the stability and reliability of automatic face recognition (FR) systems in surveillance and security scenarios. However, a massive amount of face data is usually compressed before being analyzed due to limitations on transmission or storage. The compressed images may lose the powerful identity information, resulting in the performance degradation of the FR system. Herein, we make the first attempt to study just noticeable difference (JND) for the FR system, which can be defined as the maximum distortion that the FR system cannot notice. More specifically, we establish a JND dataset including 3530 original images and 137,670 compressed images generated by advanced reference encoding/decoding software based on the Versatile Video Coding (VVC) standard (VTM-15.0). Subsequently, we develop a novel JND prediction model to directly infer JND images for the FR system. In particular, in order to maximum redundancy removal without impairment of robust identity information, we apply the encoder with multiple feature extraction and attention-based feature decomposition modules to progressively decompose face features into two uncorrelated components, i.e., identity and residual features, via self-supervised learning. Then, the residual feature is fed into the decoder to generate the residual map. Finally, the predicted JND map is obtained by subtracting the residual map from the original image. Experimental results have demonstrated that the proposed model achieves higher accuracy of JND map prediction compared with the state-of-the-art JND models, and is capable of saving more bits while maintaining the performance of the FR system compared with VTM-15.0.
翻译:需要高质量的面部图像来保证自动面部识别(FR)系统的稳定性和可靠性。然而,由于传输或存储的限制,在分析前通常会压缩大量面部数据。压缩图像可能会失去强大的身份信息,导致FR系统的性能退化。在这里,我们第一次尝试研究FR系统仅明显的差异(JND),这可以定义为FR系统无法察觉的最大扭曲。更具体地说,我们建立了一个JND数据集,包括3530个原始图像和137 670个根据Versatile视频编码(VTM-15.0)标准高级参考编码/脱色软件生成的压缩图像。随后,我们开发了一个新的JND预测模型模型,直接推导出FRR系统JND图像。特别是为了在不破坏可靠的身份信息的情况下最大限度地消除冗余作用,我们用多重特性提取和关注特征分解模块将面部图像逐渐分解成两个与相关的组成部分,即JTM数据库/deco高级编码/脱色软件生成了137 。JTM身份和残余图像的高级编码(VTM)的高级编码(VTM)标准化后,然后通过预测的精确的图像生成,通过Sloverial Stal 学习结果,将J-hal-hal-hal-halmamamamamama)的原始结果转化为结果,最后通过预化结果生成的原始地图生成结果生成结果学习结果,最终结果学习结果,通过Smamamamad-hal-cal-hal-hal-hal-halmamamamamamadal-