Hand pose estimation is a fundamental task in many human-robot interaction-related applications. However, previous approaches suffer from unsatisfying hand landmark predictions in real-world scenes and high computation burden. In this paper, we propose a fast and accurate framework for hand pose estimation, dubbed as "FastHand". Using a lightweight encoder-decoder network architecture, FastHand fulfills the requirements of practical applications running on embedded devices. The encoder consists of deep layers with a small number of parameters, while the decoder makes use of spatial location information to obtain more accurate results. The evaluation took place on two publicly available datasets demonstrating the improved performance of the proposed pipeline compared to other state-of-the-art approaches. FastHand offers high accuracy scores while reaching a speed of 25 frames per second on an NVIDIA Jetson TX2 graphics processing unit.
翻译:手形估计是许多与人类机器人互动相关应用中的一项基本任务。 但是,以往的方法在现实世界场景和高计算负担中都存在手势标志性预测不尽人意的情况。 在本文中,我们提出了一个快速和准确的手势估计框架,称为“快手”。使用轻量级编码器-解码器网络架构,“快手”满足了在嵌入装置上运行的实际应用的要求。编码器由深层组成,参数不多,而解码器则利用空间位置信息获取更准确的结果。评价是在两个公开的数据集上进行的,显示与其他最先进的方法相比,拟议管道的性能有所改善。“快手法”提供了很高的精确分数,同时在NVIDIA Jetson TX2图形处理器上达到每秒25个框架的速度。