With the continuously thriving popularity around the world, fitness activity analytic has become an emerging research topic in computer vision. While a variety of new tasks and algorithms have been proposed recently, there are growing hunger for data resources involved in high-quality data, fine-grained labels, and diverse environments. In this paper, we present FLAG3D, a large-scale 3D fitness activity dataset with language instruction containing 180K sequences of 60 categories. FLAG3D features the following three aspects: 1) accurate and dense 3D human pose captured from advanced MoCap system to handle the complex activity and large movement, 2) detailed and professional language instruction to describe how to perform a specific activity, 3) versatile video resources from a high-tech MoCap system, rendering software, and cost-effective smartphones in natural environments. Extensive experiments and in-depth analysis show that FLAG3D contributes great research value for various challenges, such as cross-domain human action recognition, dynamic human mesh recovery, and language-guided human action generation. Our dataset and source code will be publicly available at https://andytang15.github.io/FLAG3D.
翻译:随着世界各地不断蓬勃流行,健身活动分析已成为计算机视野中一个新兴的研究课题,虽然最近提出了各种新的任务和算法,但人们越来越渴望获得高质量数据、精细标签和不同环境的数据资源。本文介绍大型三维健身活动数据集FLAG3D, 包含60类180K序列的语言教学。 FLAG3D具有以下三个特点:(1) 从先进的MACP系统中获取准确和密集的三维人姿势,以处理复杂的活动和大规模移动;(2) 详细和专业的语言教学,说明如何开展具体活动;(3) 高科技的MACP系统、软件和在自然环境中具有成本效益的智能手机的多功能视频资源。广泛的实验和深入分析表明,三维健身活动为各种难题提供了巨大的研究价值,例如交叉人类行动识别、动态的人类记忆恢复和语言引导人类行动生成。我们的数据集和源代码将在https://andytang15.GLLAG15.GLAG.G.imo.