Automatic perception of human behaviors during social interactions is crucial for AR/VR applications, and an essential component is estimation of plausible 3D human pose and shape of our social partners from the egocentric view. One of the biggest challenges of this task is severe body truncation due to close social distances in egocentric scenarios, which brings large pose ambiguities for unseen body parts. To tackle this challenge, we propose a novel scene-conditioned diffusion method to model the body pose distribution. Conditioned on the 3D scene geometry, the diffusion model generates bodies in plausible human-scene interactions, with the sampling guided by a physics-based collision score to further resolve human-scene inter-penetrations. The classifier-free training enables flexible sampling with different conditions and enhanced diversity. A visibility-aware graph convolution model guided by per-joint visibility serves as the diffusion denoiser to incorporate inter-joint dependencies and per-body-part control. Extensive evaluations show that our method generates bodies in plausible interactions with 3D scenes, achieving both superior accuracy for visible joints and diversity for invisible body parts. The code will be available at https://sanweiliti.github.io/egohmr/egohmr.html.
翻译:自动感知社交互动中的人类行为对于AR/VR应用至关重要,而根据自我视角估计我们社交伙伴的可信三维人体姿态和形状是其中的一个基本组成部分。这项任务的最大挑战之一是因自我视角场景中的近距离社交情况,导致身体严重截断,从而为未见身体部位带来巨大的姿态歧义。为了解决这一挑战,我们提出了一种全新的场景条件扩散方法来建模身体姿态分布。在三维场景几何学的条件下,扩散模型生成了具有可信的人-场景交互的身体,其中采样由基于物理的碰撞评分引导,以进一步解决人-场景互穿。无需分类器的训练使得具有不同条件的灵活采样和增强的多样性成为可能。以关节可见性为指导的可见性感知图卷积模型作为扩散去噪器,用于纳入关节间依赖性和每个身体部位的控制。广泛的评估表明,我们的方法生成了在三维场景中可信的交互人体,实现了可见关节的优越准确性和不可见身体部位的多样性。该代码可在https://sanweiliti.github.io/egohmr/egohmr.html上获得。