The growing popularity of remote fitness has increased the demand for highly accurate computer vision models that track human poses. However, the best methods still fail in many real-world fitness scenarios, suggesting that there is a domain gap between current datasets and real-world fitness data. To enable the field to address fitness-specific vision problems, we created InfiniteForm, an open-source synthetic dataset of 60k images with diverse fitness poses (15 categories), both single- and multi-person scenes, and realistic variation in lighting, camera angles, and occlusions. As a synthetic dataset, InfiniteForm offers minimal bias in body shape and skin tone, and provides pixel-perfect labels for standard annotations like 2D keypoints, as well as those that are difficult or impossible for humans to produce like depth and occlusion. In addition, we introduce a novel generative procedure for creating diverse synthetic poses from predefined exercise categories. This generative process can be extended to any application where pose diversity is needed to train robust computer vision models.
翻译:远程健身越来越受欢迎,这增加了对追踪人类构成的高度准确的计算机视觉模型的需求。然而,最佳方法在许多真实世界的健身场景中仍然失败,表明当前数据集与真实世界健身数据之间存在领域差距。为了使实地能够解决健身特有视觉问题,我们创建了一个开放源码合成数据集,由60k图像组成,具有不同健身特质(15类)的开放源合成数据集(单人和多人场景),以及在照明、相机角度和隔离方面的现实差异。作为合成数据集,InfiniteForm在身体形状和皮肤音调方面提供了最低限度的偏差,并为标准说明提供了像 2D 关键点这样的像像 2D 关键点那样的象素超能力标签,以及人类难以或不可能产生像深度和隔离这样的标准说明。此外,我们引入了一种新型的基因化程序,从预先定义的锻炼类别中创造多样化的合成外观。这种基因化过程可以扩展到任何需要培养稳健的计算机视觉模型的应用。