Existing head pose estimation (HPE) mainly focuses on single person with pre-detected frontal heads, which limits their applications in real complex scenarios with multi-persons. We argue that these single HPE methods are fragile and inefficient for Multi-Person Head Pose Estimation (MPHPE) since they rely on the separately trained face detector that cannot generalize well to full viewpoints, especially for heads with invisible face areas. In this paper, we focus on the full-range MPHPE problem, and propose a direct end-to-end simple baseline named DirectMHP. Due to the lack of datasets applicable to the full-range MPHPE, we firstly construct two benchmarks by extracting ground-truth labels for head detection and head orientation from public datasets AGORA and CMU Panoptic. They are rather challenging for having many truncated, occluded, tiny and unevenly illuminated human heads. Then, we design a novel end-to-end trainable one-stage network architecture by joint regressing locations and orientations of multi-head to address the MPHPE problem. Specifically, we regard pose as an auxiliary attribute of the head, and append it after the traditional object prediction. Arbitrary pose representation such as Euler angles is acceptable by this flexible design. Then, we jointly optimize these two tasks by sharing features and utilizing appropriate multiple losses. In this way, our method can implicitly benefit from more surroundings to improve HPE accuracy while maintaining head detection performance. We present comprehensive comparisons with state-of-the-art single HPE methods on public benchmarks, as well as superior baseline results on our constructed MPHPE datasets. Datasets and code are released in https://github.com/hnuzhy/DirectMHP.
翻译:现有头部的估测( HHPE ), 主要侧重于有预知前前前头的单人, 从而限制其在多人真实复杂情景中的应用。 我们争辩说, 这些单人 HPE 方法对于多人头的Pose Estimation (MPHPE) 来说是脆弱和低效的, 因为它们依赖于独立训练的面部检测器, 无法对全视面部的头部进行全面透视。 特别是对于有隐形面部的头部。 在本文件中, 我们侧重于全程MPHPE 问题, 并提议一个直接端到端的简单基准, 名为 DentMHPHP。 由于缺少适用于全程 MPHPE 的数据集, 我们首先构建了两个基准, 用于头部检测和头部方向的地标标签标签, AGORA 和 CMUM Panphical 。 它们对于许多振荡、 隐蔽、 细微和不均匀的人类头部。 然后, 我们设计了一个新的端到端端端端对一阶段的网络结构结构结构结构结构结构结构结构结构结构,, 通过联合递增缩的定位和多头部的预图,, 我们把这一结构的预估值的预估值的预估值数据 放在了。