We propose Human-centered 4D Scene Capture (HSC4D) to accurately and efficiently create a dynamic digital world, containing large-scale indoor-outdoor scenes, diverse human motions, and rich interactions between humans and environments. Using only body-mounted IMUs and LiDAR, HSC4D is space-free without any external devices' constraints and map-free without pre-built maps. Considering that IMUs can capture human poses but always drift for long-period use, while LiDAR is stable for global localization but rough for local positions and orientations, HSC4D makes both sensors complement each other by a joint optimization and achieves promising results for long-term capture. Relationships between humans and environments are also explored to make their interaction more realistic. To facilitate many down-stream tasks, like AR, VR, robots, autonomous driving, etc., we propose a dataset containing three large scenes (1k-5k $m^2$) with accurate dynamic human motions and locations. Diverse scenarios (climbing gym, multi-story building, slope, etc.) and challenging human activities (exercising, walking up/down stairs, climbing, etc.) demonstrate the effectiveness and the generalization ability of HSC4D. The dataset and code are available at http://www.lidarhumanmotion.net/hsc4d/.
翻译:我们建议人类四维场景捕获(HSC4D)准确和高效地创建一个动态数字世界,包含大型室内外场景、多种人类运动以及人类和环境之间的丰富互动。HSC4D仅使用由人体架设的IMUs和LIDAR, 使用HSC4D是空无空间的,没有外部装置的限制,没有预设的地图也无地图。考虑到IMUs可以捕捉人类的外形,但总是可以长期漂移使用,而LISDAR对于全球本地化来说是稳定的,但对于地方定位和方向来说则是粗糙的,HSC4D通过联合优化使传感器相互补充,并实现长期捕捉的有希望的结果。还探索人类与环境之间的关系,使其互动更加现实化。为了便利许多下游任务,如AR、VR、机器人、自主驾驶等。我们提议一个包含三大场景(1k-5k 美元=2美元)的数据集,具有准确的动态人类运动和地点。 多样化的情景(健身房、多层建筑、斜坡等)和具有挑战性的人类活动能力4 。