The process of tracking human anatomy in computer vision is referred to pose estimation, and it is used in fields ranging from gaming to surveillance. Three-dimensional pose estimation traditionally requires advanced equipment, such as multiple linked intensity cameras or high-resolution time-of-flight cameras to produce depth images. However, there are applications, e.g.~consumer electronics, where significant constraints are placed on the size, power consumption, weight and cost of the usable technology. Here, we demonstrate that computational imaging methods can achieve accurate pose estimation and overcome the apparent limitations of time-of-flight sensors designed for much simpler tasks. The sensor we use is already widely integrated in consumer-grade mobile devices, and despite its low spatial resolution, only 4$\times$4 pixels, our proposed Pixels2Pose system transforms its data into accurate depth maps and 3D pose data of multiple people up to a distance of 3 m from the sensor. We are able to generate depth maps at a resolution of 32$\times$32 and 3D localization of a body parts with an error of only $\approx$10 cm at a frame rate of 7 fps. This work opens up promising real-life applications in scenarios that were previously restricted by the advanced hardware requirements and cost of time-of-flight technology.
翻译:计算机视觉中的人类解剖跟踪过程被指为具有估计意义,它被用于从游戏到监视等各个领域。三维的估算传统上需要先进的设备,例如多联结的强度摄像头或高分辨率飞行时间摄像头,以制作深度图像。然而,有些应用,例如消费者电子,对可用技术的大小、电耗、重量和成本设置了重大限制。在这里,我们证明计算成像方法可以实现准确的估计,并克服为更简单的任务设计的飞行时间传感器的明显局限性。我们使用的传感器已经广泛融入消费者级移动设备,尽管其空间分辨率低,只有4美元/times4 像素。我们提议的像素2Pix系统将其数据转换成准确深度地图,3D为多人的数据,其距离感应达3米。我们可以以32美元/32美元和3D的分辨率绘制深度地图,从而克服了为更简单的任务设计的飞行时间传感器的明显局限性。我们使用的传感器已经广泛融入了消费者级移动设备,尽管其空间分辨率低,只有4美元/times4美元/timeple:4 pixelps;我们提议的像stem系统将其数据转换为先进的硬度应用的先进硬度要求,在7-fli-lifliflifli-flical-flical-flical-flicreflipal eximpal ex