Recent advancements in computer vision have seen a rise in the prominence of applications using neural networks to understand human poses. However, while accuracy has been steadily increasing on State-of-the-Art datasets, these datasets often do not address the challenges seen in real-world applications. These challenges are dealing with people distant from the camera, people in crowds, and heavily occluded people. As a result, many real-world applications have trained on data that does not reflect the data present in deployment, leading to significant underperformance. This article presents ADG-Pose, a method for automatically generating datasets for real-world human pose estimation. These datasets can be customized to determine person distances, crowdedness, and occlusion distributions. Models trained with our method are able to perform in the presence of these challenges where those trained on other datasets fail. Using ADG-Pose, end-to-end accuracy for real-world skeleton-based action recognition sees a 20% increase on scenes with moderate distance and occlusion levels, and a 4X increase on distant scenes where other models failed to perform better than random.
翻译:计算机愿景的近期进步显示,使用神经网络来理解人造的应用程序的显著地位有所提升。 然而,尽管在最新数据集中,准确性一直在稳步提高,但这些数据集往往无法应对现实世界应用程序中遇到的挑战。 这些挑战涉及远离相机的人、人群中的人和被严重隔离的人。 结果,许多现实世界应用程序都对数据进行了培训,这些数据没有反映部署中的数据,导致显著的不良表现。 文章展示了ADG-Pose,这是为真实世界人类形象估计自动生成数据集的一种方法。 这些数据集可以定制用于确定人距离、拥挤和隐蔽分布。 使用我们方法培训的模型能够在其他数据集培训失败时,在这些挑战面前发挥作用。 使用ADG-Pose,基于现实世界的骨骼动作识别端端端到端的精确度,在中等距离和隐蔽水平的场景上增加了20%,在其他模型无法比随机效果更好的远处增加了4X。