Hand pose estimation is a fundamental task in many human-robot interaction-related applications. However, previous approaches suffer from unsatisfying hand landmark predictions in real-world scenes and high computation burden. This paper proposes a fast and accurate framework for hand pose estimation, dubbed as "FastHand". Using a lightweight encoder-decoder network architecture, FastHand fulfills the requirements of practical applications running on embedded devices. The encoder consists of deep layers with a small number of parameters, while the decoder makes use of spatial location information to obtain more accurate results. The evaluation took place on two publicly available datasets demonstrating the improved performance of the proposed pipeline compared to other state-of-the-art approaches. FastHand offers high accuracy scores while reaching a speed of 25 frames per second on an NVIDIA Jetson TX2 graphics processing unit.
翻译:手形估计是许多与人类机器人互动相关应用中的一项基本任务。 但是,以往的方法在现实世界场景和高计算负担中都存在手势标志性预测不尽人意的情况。本文件提出了一个快速准确的手势估计框架,称为“快手”。使用轻量的编码器-解码器网络架构,快速手法满足了在嵌入装置上运行的实际应用的要求。编码器由深层组成,有少量参数,而解码器则利用空间位置信息获取更准确的结果。评价是在两个公开的数据集上进行的,这些数据集显示,与其他最先进的方法相比,拟议管道的性能有所改善。快速手势提供了高精度计,同时在NVIDIA Jetson TX2图形处理器上达到每秒25个框架的速度。