Fine-grained capturing of 3D HOI boosts human activity understanding and facilitates downstream visual tasks, including action recognition, holistic scene reconstruction, and human motion synthesis. Despite its significance, existing works mostly assume that humans interact with rigid objects using only a few body parts, limiting their scope. In this paper, we address the challenging problem of f-AHOI, wherein the whole human bodies interact with articulated objects, whose parts are connected by movable joints. We present CHAIRS, a large-scale motion-captured f-AHOI dataset, consisting of 16.2 hours of versatile interactions between 46 participants and 81 articulated and rigid sittable objects. CHAIRS provides 3D meshes of both humans and articulated objects during the entire interactive process, as well as realistic and physically plausible full-body interactions. We show the value of CHAIRS with object pose estimation. By learning the geometrical relationships in HOI, we devise the very first model that leverage human pose estimation to tackle the estimation of articulated object poses and shapes during whole-body interactions. Given an image and an estimated human pose, our model first reconstructs the pose and shape of the object, then optimizes the reconstruction according to a learned interaction prior. Under both evaluation settings (e.g., with or without the knowledge of objects' geometries/structures), our model significantly outperforms baselines. We hope CHAIRS will promote the community towards finer-grained interaction understanding. We will make the data/code publicly available.
翻译:精细捕捉3D HOI 有助于人类活动的理解,便利下游视觉任务,包括行动识别、整体现场重建和人类运动合成。尽管其意义重大,但现有作品大多假定人类与僵硬物体发生互动,只使用几处身体部分,限制其范围。在本文中,我们处理F-AHOI这个具有挑战性的问题,在这个问题上,整个人体身体与直立物体发生互动,其各部分通过移动连接连接。我们介绍CHAIRS,一个大型运动采集的F-AHOI数据集,由46名参与者与81个明确和僵硬的静坐物体之间16.2小时的多功能互动组成。CHAIRS在整个互动过程中提供人类和直立物体的3D模件和直立物体的直径。我们展示了CHAI的价值,通过了解HOI的几何形状关系,我们设计了第一个模型来利用人类的模型来估计整个身体互动期间的清晰物体的构成和形状。根据图像和估计的人类面形貌形貌,我们第一次用模型来重建人类的形态/结构,然后根据我们所了解的地理结构,然后将大大地进行实地评估。