Facial landmarks (FLM) estimation is a critical component in many face-related applications. In this work, we aim to optimize for both accuracy and speed and explore the trade-off between them. Our key observation is that not all faces are created equal. Frontal faces with neutral expressions converge faster than faces with extreme poses or expressions. To differentiate among samples, we train our model to predict the regression error after each iteration. If the current iteration is accurate enough, we stop iterating, saving redundant iterations while keeping the accuracy in check. We also observe that as neighboring patches overlap, we can infer all facial landmarks (FLMs) with only a small number of patches without a major accuracy sacrifice. Architecturally, we offer a multi-scale, patch-based, lightweight feature extractor with a fine-grained local patch attention module, which computes a patch weighting according to the information in the patch itself and enhances the expressive power of the patch features. We analyze the patch attention data to infer where the model is attending when regressing facial landmarks and compare it to face attention in humans. Our model runs in real-time on a mobile device GPU, with 95 Mega Multiply-Add (MMA) operations, outperforming all state-of-the-art methods under 1000 MMA, with a normalized mean error of 8.16 on the 300W challenging dataset.
翻译:孔径地标( FLM) 估计是许多与面相相关应用程序的关键组成部分 。 在这项工作中, 我们的目标是优化准确性和速度, 并探索它们之间的平衡。 我们的关键观察是, 并非所有面孔都是相等的。 中性面孔与中性面孔的交汇速度快于极端面孔或表达方式的交汇速度快。 为了区分样本, 我们训练模型来预测每次迭代之后的回归错误。 如果目前的迭代足够准确, 我们停止迭代, 保存多余的迭代, 并同时保持准确性。 我们还观察到, 作为相邻的补点, 我们只能将所有面形迹( FLMMs) 相互切换, 只有少量面孔面孔, 而没有重大的准确性牺牲。 从结构上看, 我们提供了一个多尺寸的、 补丁基、 轻度的特征提取器, 配有精细的本地补接式注意模块, 根据补接点本身的信息进行补丁, 并增强补接特性的表达力力。 我们分析了模型显示模型在哪些地方的补接合点数据, 在根据面面缩缩缩缩缩缩缩缩缩缩缩缩缩的MMA 。 我们运行中, 将模型中, 将模型与所有移动的缩成的缩成型的模型进行到 。