Face attribute evaluation plays an important role in video surveillance and face analysis. Although methods based on convolution neural networks have made great progress, they inevitably only deal with one local neighborhood with convolutions at a time. Besides, existing methods mostly regard face attribute evaluation as the individual multi-label classification task, ignoring the inherent relationship between semantic attributes and face identity information. In this paper, we propose a novel \textbf{trans}former-based representation for \textbf{f}ace \textbf{a}ttribute evaluation method (\textbf{TransFA}), which could effectively enhance the attribute discriminative representation learning in the context of attention mechanism. The multiple branches transformer is employed to explore the inter-correlation between different attributes in similar semantic regions for attribute feature learning. Specially, the hierarchical identity-constraint attribute loss is designed to train the end-to-end architecture, which could further integrate face identity discriminative information to boost performance. Experimental results on multiple face attribute benchmarks demonstrate that the proposed TransFA achieves superior performances compared with state-of-the-art methods.
翻译:在视频监视和面貌分析中,面貌属性评价起着重要作用。 虽然基于神经网络的共变方法取得了巨大进步, 但它们不可避免地只能同时处理一个局部社区。 此外, 现有方法大多将面貌属性评价视为个体多标签分类任务, 忽视语义属性和面貌身份信息之间的内在关系 。 在本文中, 我们为\ textbf{ f} ace \ textbf{ a} a} textbf{ a}tritte 评估方法(\ textbf{transFA}) 提出了一个新的 \ textb{transFA}, 这种方法可以有效加强关注机制中的分化代表性学习。 多分支变换器用于探索类似语义区域不同属性属性属性属性属性学习属性之间的一致性。 特别地, 等级身份- 约束属性损失旨在培训端对端结构, 从而进一步整合面身份歧视信息以提升性能。 多重面貌属性基准的实验结果显示, 拟议的 TransFA 实现了与状态方法相比的优异性表现。