We consider the challenging multi-person 3D body mesh estimation task in this work. Existing methods are mostly two-stage based--one stage for person localization and the other stage for individual body mesh estimation, leading to redundant pipelines with high computation cost and degraded performance for complex scenes (e.g., occluded person instances). In this work, we present a single-stage model, Body Meshes as Points (BMP), to simplify the pipeline and lift both efficiency and performance. In particular, BMP adopts a new method that represents multiple person instances as points in the spatial-depth space where each point is associated with one body mesh. Hinging on such representations, BMP can directly predict body meshes for multiple persons in a single stage by concurrently localizing person instance points and estimating the corresponding body meshes. To better reason about depth ordering of all the persons within the same scene, BMP designs a simple yet effective inter-instance ordinal depth loss to obtain depth-coherent body mesh estimation. BMP also introduces a novel keypoint-aware augmentation to enhance model robustness to occluded person instances. Comprehensive experiments on benchmarks Panoptic, MuPoTS-3D and 3DPW clearly demonstrate the state-of-the-art efficiency of BMP for multi-person body mesh estimation, together with outstanding accuracy. Code can be found at: https://github.com/jfzhang95/BMP.
翻译:在这项工作中,我们考虑具有挑战性的多人3D机体网格估算任务。现有方法大多是个人定位分为两个阶段,一个阶段是个人定位,另一个阶段是个体网格估算,导致重复的管道,计算成本高,复杂场景的性能退化(例如隐形人实例)。在这项工作中,我们提出了一个单一阶段模型,即身体Moseshes作为点(BMP),以简化管道,提高效率和性能。特别是,BMP采用一种新的方法,代表多个人实例,作为空间深度空间深处的点,每个点都与一个机体网格相联系。在这种展示上,BMP可以直接预测多个人的单阶段的体形板块,同时将个人实例点定位,并估计相应的体形模。为了更清楚地了解同一场景层所有的人的深度排列,BMP设计一个简单而有效的内部或深层损失,以获得深度的体积体积估计。BMP 还在空间深度空间深度空间深度空间深度空间深空间的点上引入了一种新的关键意识增强点,以加强模型的坚固度-D,从而明确展示了MIS-D的精确度标准。