We propose a method for annotating images of a hand manipulating an object with the 3D poses of both the hand and the object, together with a dataset created using this method. There is a current lack of annotated real images for this problem, as estimating the 3D poses is challenging, mostly because of the mutual occlusions between the hand and the object. To tackle this challenge, we capture sequences with one or several RGB-D cameras, and jointly optimizes the 3D hand and object poses over all the frames \emph{simultaneously}. This method allows us to automatically annotate each frame with accurate estimates of the poses, despite large mutual occlusions. With this method, we created \datasetname, the first markerless dataset of color images with 3D annotations of both hand and object. This dataset is currently made of 80,000 frames, 65 sequences, 10 persons, and 10 objects, and growing, and we will make it publicly available upon publication. We also use it to train a deepnet to perform RGB-based single frame hand pose estimation and provide a baseline on our dataset.
翻译:我们建议一种方法来说明手控对象的3D图象,用手和对象的3D图象,以及使用这种方法创建的数据集。目前缺乏这一问题的注释性真实图像,因为估计3D图象具有挑战性,主要是因为手和对象之间相互隔开。为了应对这一挑战,我们用一个或几个 RGB-D 相机来捕捉序列,并联合优化3D 手和对象在所有框架 \ emph{ simultaney} 上摆放的3D 手和对象。这个方法使我们能够自动对每个框架进行注注注,准确估计构成。尽管存在大量相互隔离现象,但我们创建了\ datasegname,这是第一个带有手和对象3D说明的无标记图像数据集。这个数据集目前由80 000个框架、65个序列、10个人和10个对象组成,并在出版时予以公布。我们还利用它来训练深网进行基于 RGB 的单框架的单框姿势估计,并提供我们数据设置的基准。