Understanding social interactions from first-person views is crucial for many applications, ranging from assistive robotics to AR/VR. A first step for reasoning about interactions is to understand human pose and shape. However, research in this area is currently hindered by the lack of data. Existing datasets are limited in terms of either size, annotations, ground-truth capture modalities or the diversity of interactions. We address this shortcoming by proposing EgoBody, a novel large-scale dataset for social interactions in complex 3D scenes. We employ Microsoft HoloLens2 headsets to record rich egocentric data streams (including RGB, depth, eye gaze, head and hand tracking). To obtain accurate 3D ground-truth, we calibrate the headset with a multi-Kinect rig and fit expressive SMPL-X body meshes to multi-view RGB-D frames, reconstructing 3D human poses and shapes relative to the scene. We collect 68 sequences, spanning diverse sociological interaction categories, and propose the first benchmark for 3D full-body pose and shape estimation from egocentric views. Our dataset and code will be available for research at https://sanweiliti.github.io/egobody/egobody.html.
翻译:从第一人的观点来理解社会互动对于许多应用至关重要,从辅助机器人到AR/VR。关于互动的推理的第一步是了解人类的形态和形状。然而,目前这方面的研究受到数据缺乏的阻碍。现有的数据集在大小、说明、地面真相捕获模式或互动多样性方面都有限。我们通过提议EgoBody来解决这一缺陷。EgoBody是复杂的三维场景中社会互动的新型大型数据集。我们使用微软 HoloLens2头盔记录丰富的自我中心数据流(包括RGB、深度、眼视、头和手跟踪)。为了获得准确的 3D 地面图例,我们用多光谱的钻机校准头板,并将SMPL-X体模模模缩到多视图的 RGB-D 框架,重建3D人形和形状。我们收集了68个序列,跨越了多种社会互动类别,并提出了3D全体的首个基准,从自我中心观点来估算。我们的数据设置和代码将可在 httpsscob/scobe上进行研究。