In this paper we introduce a novel method to estimate the head pose of people in single images starting from a small set of head keypoints. To this purpose, we propose a regression model that exploits keypoints computed automatically by 2D pose estimation algorithms and outputs the head pose represented by yaw, pitch, and roll. Our model is simple to implement and more efficient with respect to the state of the art -- faster in inference and smaller in terms of memory occupancy -- with comparable accuracy. Our method also provides a measure of the heteroscedastic uncertainties associated with the three angles, through an appropriately designed loss function; we show there is a correlation between error and uncertainty values, thus this extra source of information may be used in subsequent computational steps. As an example application, we address social interaction analysis in images: we propose an algorithm for a quantitative estimation of the level of interaction between people, starting from their head poses and reasoning on their mutual positions. The code is available at https://github.com/cantarinigiorgio/HHP-Net.
翻译:在本文中,我们引入了一种新颖的方法来估计单一图像中人们的头部面貌,该方法从一组小的顶点点开始。为此,我们提议了一个回归模型,利用由 2D 自动计算的关键点来提供估计算法和输出,以亚线、投球和滚动为代表头部。我们的模型简单易实施,而且对于艺术状态来说效率更高 -- -- 快速推论,记忆占用率较小 -- -- 具有可比性的准确性。我们的方法还提供了一个测量与三个角度相关的外观不确定性的尺度,通过一个设计得当的损失函数;我们显示了错误和不确定值之间的关联性,因此,这一额外的信息来源可以用于以后的计算步骤。举例来说,我们处理图像中的社会互动分析:我们建议一种定量估计人际互动水平的算法,从他们的头部姿势和相互位置的推理开始。代码可以在 https://github.com/cantariigiorg/HHP-Net上查阅。