3D hand pose estimation from RGB images suffers from the difficulty of obtaining the depth information. Therefore, a great deal of attention has been spent on estimating 3D hand pose from 2D hand joints. In this paper, we leverage the advantage of spatial-temporal Graph Convolutional Neural Networks and propose LG-Hand, a powerful method for 3D hand pose estimation. Our method incorporates both spatial and temporal dependencies into a single process. We argue that kinematic information plays an important role, contributing to the performance of 3D hand pose estimation. We thereby introduce two new objective functions, Angle and Direction loss, to take the hand structure into account. While Angle loss covers locally kinematic information, Direction loss handles globally kinematic one. Our LG-Hand achieves promising results on the First-Person Hand Action Benchmark (FPHAB) dataset. We also perform an ablation study to show the efficacy of the two proposed objective functions.
翻译:对 RGB 图像的 3D 手势估计很难获得深度信息。 因此, 大量注意力都花在了估算 2D 手关节的 3D 手姿势上。 在本文中, 我们利用空间- 时图进动神经网络的优势, 并提议3D 手势的强力方法 LG-Hand 进行3D 显示估计。 我们的方法将空间和时间的依存都纳入一个单一的过程。 我们争论说, 运动信息起着重要作用, 有助于3D 手的性能。 因此, 我们引入了两个新的目标功能, 即 角和方向损失, 以考虑手动结构 。 在 Agle 损失涵盖本地运动信息的同时, 方向损失处理全球运动信息 。 我们的 LG-Hand 在第一手行动基准( FPHAB) 数据集上取得了可喜的结果 。 我们还进行了一项对比研究, 以显示两个拟议目标功能的功效 。