Understanding social interactions from egocentric views is crucial for many applications, ranging from assistive robotics to AR/VR. Key to reasoning about interactions is to understand the body pose and motion of the interaction partner from the egocentric view. However, research in this area is severely hindered by the lack of datasets. Existing datasets are limited in terms of either size, capture/annotation modalities, ground-truth quality, or interaction diversity. We fill this gap by proposing EgoBody, a novel large-scale dataset for human pose, shape and motion estimation from egocentric views, during interactions in complex 3D scenes. We employ Microsoft HoloLens2 headsets to record rich egocentric data streams (including RGB, depth, eye gaze, head and hand tracking). To obtain accurate 3D ground truth, we calibrate the headset with a multi-Kinect rig and fit expressive SMPL-X body meshes to multi-view RGB-D frames, reconstructing 3D human shapes and poses relative to the scene, over time. We collect 125 sequences, spanning diverse interaction scenarios, and propose the first benchmark for 3D full-body pose and shape estimation of the social partner from egocentric views. We extensively evaluate state-of-the-art methods, highlight their limitations in the egocentric scenario, and address such limitations leveraging our high-quality annotations. Data and code are available at https://sanweiliti.github.io/egobody/egobody.html.
翻译:理解自我中心观点的社会互动对于许多应用至关重要,从辅助机器人到AR/VR。 互动的推理关键在于从自我中心观点理解互动伙伴的身体构成和运动。 然而,这方面的研究因缺乏数据集而严重受阻。 现有的数据集在大小、 捕获/说明模式、 地面真相质量或互动多样性方面受到限制。 我们通过提议EgoBody来填补这一空白,这是在复杂的 3D 场景中的互动期间,从自我中心观点中,从自我中心观点中,从自我中心观点、形状和运动角度,来重建新的大规模数据集。 我们使用微软 HoloLensal2 头盔来记录丰富的以自我为中心的数据流(包括 RGB、深度、眼视、头和手追踪等) 。 要获得准确的 3D 地面真相, 我们用多直径、 直观的 SMPL- X 体模缩缩缩缩缩写到多视 RGB-D 框架, 重建3D 人类形状和与场相对的体形。 我们收集了125的序列, 跨越不同的交互互动假设情景情景, 深度、直径视、直观、直观的自我中心/直观 和直观的自我评估、我们的数据- 、直观的自我中心/直径定的自我评估。