In this paper, we address the problem of estimating the hand pose from the egocentric view when the hand is interacting with objects. Specifically, we propose a method to label a dataset Ego-Siam which contains the egocentric images pair-wisely. We also use the collected pairwise data to train our encoder-decoder style network which has been proven efficient in. This could bring extra training efficiency and testing accuracy. Our network is lightweight and can be performed with over 30 FPS with an outdated GPU. We demonstrate that our method outperforms Mueller et al. which is the state of the art work dealing with egocentric hand-object interaction problems on the GANerated dataset. To show the ability to preserve the semantic information of our method, we also report the performance of grasp type classification on GUN-71 dataset and outperforms the benchmark by only using the predicted 3-d hand pose.
翻译:在本文中, 我们从与对象互动时的自我中心角度来估计手表姿势的问题。 具体地说, 我们提出一种方法来标出包含自我中心图像的数据集 Ego- Siam 。 我们还使用所收集的对称数据来训练我们的编码器- 解码器风格网络, 这可以带来额外的培训效率和测试精度。 我们的网络是轻量级的, 可以用一个过时的 GPU 使用超过 30 个 FPS 进行。 我们证明我们的方法优于 Mueller et al。 这是处理 GANered 数据集中以自我中心为主的手反弹互动问题的艺术工作状态 。 为了显示保存我们方法的语义信息的能力, 我们还报告GUN- 71 数据集的掌握型分类的性能, 并且仅使用预测的 3 d 手姿势来超越基准 。