Human body orientation estimation (HBOE) is widely applied into various applications, including robotics, surveillance, pedestrian analysis and autonomous driving. Although many approaches have been addressing the HBOE problem from specific under-controlled scenes to challenging in-the-wild environments, they assume human instances are already detected and take a well cropped sub-image as the input. This setting is less efficient and prone to errors in real application, such as crowds of people. In the paper, we propose a single-stage end-to-end trainable framework for tackling the HBOE problem with multi-persons. By integrating the prediction of bounding boxes and direction angles in one embedding, our method can jointly estimate the location and orientation of all bodies in one image directly. Our key idea is to integrate the HBOE task into the multi-scale anchor channel predictions of persons for concurrently benefiting from engaged intermediate features. Therefore, our approach can naturally adapt to difficult instances involving low resolution and occlusion as in object detection. We validated the efficiency and effectiveness of our method in the recently presented benchmark MEBOW with extensive experiments. Besides, we completed ambiguous instances ignored by the MEBOW dataset, and provided corresponding weak body-orientation labels to keep the integrity and consistency of it for supporting studies toward multi-persons. Our work is available at \url{https://github.com/hnuzhy/JointBDOE}.
翻译:人体方向估计(HBOE)被广泛应用于各种应用,包括机器人、监视、行人分析和自主驾驶等。虽然许多方法一直在解决HBOE问题,从特定的受控制不足的场景到具有挑战性的环境,但它们假设人类的事例已经检测出来,并采用精细的副图像作为输入。这种环境效率较低,在实际应用中容易出现错误,如人群群。在文件中,我们提议了一个单阶段端到端的训练框架,以解决多人HBOE问题。通过将捆绑箱和方向角度的预测纳入一个嵌入器,我们的方法可以共同估计一个图像中所有机构的位置和方向。我们的主要想法是将HBOE的任务纳入人们的多尺度固定频道预测中,以便同时从参与的中间特征中获益。因此,我们的方法可以自然地适应在物体探测中涉及低分辨率和封闭度的困难情形。我们验证了我们在最近提交的MEBOW基准中的方法的效率和有效性,并提供了广泛的实验。此外,我们完成了一个图像中所有机构对我们所具备的可靠性的模棱两可忽略的情况。