This paper presents a Refinement Pyramid Transformer (RePFormer) for robust facial landmark detection. Most facial landmark detectors focus on learning representative image features. However, these CNN-based feature representations are not robust enough to handle complex real-world scenarios due to ignoring the internal structure of landmarks, as well as the relations between landmarks and context. In this work, we formulate the facial landmark detection task as refining landmark queries along pyramid memories. Specifically, a pyramid transformer head (PTH) is introduced to build both homologous relations among landmarks and heterologous relations between landmarks and cross-scale contexts. Besides, a dynamic landmark refinement (DLR) module is designed to decompose the landmark regression into an end-to-end refinement procedure, where the dynamically aggregated queries are transformed to residual coordinates predictions. Extensive experimental results on four facial landmark detection benchmarks and their various subsets demonstrate the superior performance and high robustness of our framework.
翻译:本文介绍了一个精密的金字塔变形器(RePFormer), 用于进行稳健的面部标志性检测。 大多数面部标志性检测器都侧重于学习具有代表性的图像特征。 但是,这些有线电视新闻网的特征显示器由于忽略了地标的内部结构以及地标和背景之间的关系,不足以处理复杂的真实世界情景。 在这项工作中,我们将面部标志性检测任务设计成精细化金字塔记忆中的标志性查询。 具体地说, 引入了金字塔变形仪头(PTH), 以在地标之间以及地标和跨尺度背景之间构建同质关系。 此外, 一个动态的地标改进模块(DLR) 旨在将地标回归转换成一个端到端端的完善程序, 动态汇总查询转换为残余坐标预测。 四个面界标探测基准及其各子的广泛实验结果显示了我们框架的优异性及高度坚固性。