This report describes our 1st place solution to ECCV 2022 challenge on Human Body, Hands, and Activities (HBHA) from Egocentric and Multi-view Cameras (hand pose estimation). In this challenge, we aim to estimate global 3D hand poses from the input image where two hands and an object are interacting on the egocentric viewpoint. Our proposed method performs end-to-end multi-hand pose estimation via transformer architecture. In particular, our method robustly estimates hand poses in a scenario where two hands interact. Additionally, we propose an algorithm that considers hand scales to robustly estimate the absolute depth. The proposed algorithm works well even when the hand sizes are various for each person. Our method attains 14.4 mm (left) and 15.9 mm (right) errors for each hand in the test set.
翻译:本报告描述了我们对2022年Egocentic和多视图相机对人体、手和活动(HBHA)提出的ECCV 2022挑战(HBHA)的第一点解决方案。 在这项挑战中,我们的目标是从输入图像中估算三维手的全局,其中两只手和一个物体在自我中心观点上相互作用。我们建议的方法通过变压器结构对端对端多手的图像进行多手的估测。特别是,我们的方法强力估计手在两只手相互作用的情景中构成。此外,我们提出了一种算法,该算法考虑到手表的尺度,可以有力地估计绝对深度。即使每个人的手大小不同,拟议的算法也运作良好。我们的计算方法达到测试组每只手14.4毫米(左)和15.9毫米(右)的14.4毫米(左)和15.9毫米(右)错误。