There is an increasing demand for lightweight multi-person pose estimation for many emerging smart IoT applications. However, the existing algorithms tend to have large model sizes and intense computational requirements, making them ill-suited for real-time applications and deployment on resource-constrained hardware. Lightweight and real-time approaches are exceedingly rare and come at the cost of inferior accuracy. In this paper, we present EfficientHRNet, a family of lightweight multi-person human pose estimators that are able to perform in real-time on resource-constrained devices. By unifying recent advances in model scaling with high-resolution feature representations, EfficientHRNet creates highly accurate models while reducing computation enough to achieve real-time performance. The largest model is able to come within 4.4% accuracy of the current state-of-the-art, while having 1/3 the model size and 1/6 the computation, achieving 23 FPS on Nvidia Jetson Xavier. Compared to the top real-time approach, EfficientHRNet increases accuracy by 22% while achieving similar FPS with 1/3 the power. At every level, EfficientHRNet proves to be more computationally efficient than other bottom-up 2D human pose estimation approaches, while achieving highly competitive accuracy.
翻译:然而,现有的算法往往具有庞大的模型规模和密集的计算要求,使这些算法不适于实时应用和部署在资源受限制的硬件上。轻量和实时方法极为罕见,而且以低精度为代价。在本文中,我们展示了高效的HRNet,这是一个由能够实时在资源限制装置上实时运行的轻量多人构成的人类表面估计器组成的大家庭。高效的HRNet通过以高分辨率特征表示的方式统一最近在模型推广方面的最新进展,创建了高度准确的模型,同时减少了足以实现实时性能的计算。最大的模型能够达到目前最新工艺水平的4.4%的准确度,而模型大小为1/3,计算成本为1/6,在Nvidia Jetson Xavier上达到23 FPS。与顶级实时方法相比,高效的HRNet增加了22 %,同时以1/3的功率实现类似的FPS。在每一个级别上,高效的HRNet都证明,同时具有更高的成本性,同时具有较高的人类估算方法。