Hand gesture recognition constitutes the initial step in most methods related to human-robot interaction. There are two key challenges in this task. The first one corresponds to the difficulty of achieving stable and accurate hand landmark predictions in real-world scenarios, while the second to the decreased time of forward inference. In this paper, we propose a fast and accurate framework for hand pose estimation, dubbed as "FastHand". Using a lightweight encoder-decoder network architecture, we achieve to fulfil the requirements of practical applications running on embedded devices. The encoder consists of deep layers with a small number of parameters, while the decoder makes use of spatial location information to obtain more accurate results. The evaluation took place on two publicly available datasets demonstrating the improved performance of the proposed pipeline compared to other state-of-the-art approaches. FastHand offers high accuracy scores while reaching a speed of 25 frames per second on an NVIDIA Jetson TX2 graphics processing unit.
翻译:手势识别是大多数与人-机器人互动有关的方法中的第一步。 这项任务有两个关键挑战。 第一个是难以在现实世界情景中实现稳定和准确的手势里程碑预测,第二个是前推推推时间的减少。 在本文中,我们提出了一个快速和准确的手势估计框架,称为“ FastHand ” 。 我们使用一个轻量的编码器-解码器网络架构,达到了在嵌入装置上运行的实际应用的要求。 编码器由深层组成, 参数不多, 而解码器则利用空间位置信息获取更准确的结果。 评估是在两个公开的数据集上进行的, 表明与其他最先进的方法相比, 拟议管道的性能有所改善。 快速Hand提供了高精度分数,同时在 NVIDIA Jetson TX2 图形处理器上达到每秒25个框架的速度。