While on-body device-based human motion estimation is crucial for applications such as XR interaction, existing methods often suffer from poor wearability, expensive hardware, and cumbersome calibration, which hinder their adoption in daily life. To address these challenges, we present EveryWear, a lightweight and practical human motion capture approach based entirely on everyday wearables: a smartphone, smartwatch, earbuds, and smart glasses equipped with one forward-facing and two downward-facing cameras, requiring no explicit calibration before use. We introduce Ego-Elec, a 9-hour real-world dataset covering 56 daily activities across 17 diverse indoor and outdoor environments, with ground-truth 3D annotations provided by the motion capture (MoCap), to facilitate robust research and benchmarking in this direction. Our approach employs a multimodal teacher-student framework that integrates visual cues from egocentric cameras with inertial signals from consumer devices. By training directly on real-world data rather than synthetic data, our model effectively eliminates the sim-to-real gap that constrains prior work. Experiments demonstrate that our method outperforms baseline models, validating its effectiveness for practical full-body motion estimation.
翻译:基于可穿戴设备的人体运动估计对于XR交互等应用至关重要,然而现有方法通常存在佩戴舒适性差、硬件成本高昂以及标定过程繁琐等问题,阻碍了其在日常生活中的广泛应用。为应对这些挑战,我们提出了EveryWear——一种完全基于日常可穿戴设备的轻量级实用人体运动捕捉方案:仅需智能手机、智能手表、无线耳机以及配备单前视与双下视摄像头的智能眼镜,且使用前无需显式标定。我们构建了Ego-Elec数据集,包含9小时覆盖17种不同室内外场景中56种日常活动的真实世界数据,并通过动作捕捉系统提供真实三维标注,以推动该方向的鲁棒性研究与性能评估。本方法采用多模态师生框架,将第一人称视角摄像头的视觉线索与消费级设备的惯性信号相融合。通过直接在真实数据而非合成数据上进行训练,我们的模型有效消除了制约先前工作的仿真与现实差距。实验表明,该方法在性能上超越基线模型,验证了其实用性全身运动估计的有效性。