While the voxel-based methods have achieved promising results for multi-person 3D pose estimation from multi-cameras, they suffer from heavy computation burdens, especially for large scenes. We present Faster VoxelPose to address the challenge by re-projecting the feature volume to the three two-dimensional coordinate planes and estimating X, Y, Z coordinates from them separately. To that end, we first localize each person by a 3D bounding box by estimating a 2D box and its height based on the volume features projected to the xy-plane and z-axis, respectively. Then for each person, we estimate partial joint coordinates from the three coordinate planes separately which are then fused to obtain the final 3D pose. The method is free from costly 3D-CNNs and improves the speed of VoxelPose by ten times and meanwhile achieves competitive accuracy as the state-of-the-art methods, proving its potential in real-time applications.
翻译:虽然基于 voxel 的方法在多人 3D 的估算中取得了令人乐观的结果, 但是它们却承受着沉重的计算负担, 特别是大场景。 我们展示了 Appler VoxelPose 来应对挑战, 将功能量重新投射到三维坐标方块上, 并将X、 Y、 Z 坐标分别从它们中估算出来。 为此, 我们首先根据X- plane 和 z- axis 的预测体积特性分别估算一个 2D 箱及其高度, 从而将每个人定位于一个 3D 框。 然后, 我们估算出三个坐标方块的局部联合坐标, 然后将三维坐标分开, 然后将它们连接起来, 以获得最后的 3D 3D 方形方块。 这种方法不受昂贵的 3D- CN N 影响, 并且将 VoxelPose 速度提高十倍, 同时将具有竞争力的准确性, 作为最先进的方法, 证明它在实时应用中的潜力 。