Autonomous robotic systems operating in human environments must understand their surroundings to make accurate and safe decisions. In crowded human scenes with close-up human-robot interaction and robot navigation, a deep understanding requires reasoning about human motion and body dynamics over time with human body pose estimation and tracking. However, existing datasets either do not provide pose annotations or include scene types unrelated to robotic applications. Many datasets also lack the diversity of poses and occlusions found in crowded human scenes. To address this limitation we introduce JRDB-Pose, a large-scale dataset and benchmark for multi-person pose estimation and tracking using videos captured from a social navigation robot. The dataset contains challenge scenes with crowded indoor and outdoor locations and a diverse range of scales and occlusion types. JRDB-Pose provides human pose annotations with per-keypoint occlusion labels and track IDs consistent across the scene. A public evaluation server is made available for fair evaluation on a held-out test set. JRDB-Pose is available at https://jrdb.erc.monash.edu/ .
翻译:在人类环境中运行的自主机器人系统必须了解周围环境,以便作出准确和安全的决定。在人与机器人相互作用和机器人导航紧密相连的拥挤的人类场景中,深刻理解需要推理人与人体随时间推移而变化的人体运动和身体动态,从而作出估计和跟踪。但是,现有的数据集要么不提供说明,要么不包括与机器人应用无关的场景类型。许多数据集还缺乏在拥挤的人类场景中发现的构成和隔离的多样性。为了应对这一局限性,我们引入了JRDB-Pose,一个大型的数据集和基准,用于利用社会导航机器人拍摄的视频对多人构成和跟踪。数据集包含挑战场景,室内和室外地点拥挤,规模和隐蔽类型各异。JRDB-Pose提供人姿势说明,并配有每关键点的隐蔽标签和跟踪标识。一个公开评价服务器,用于对一个悬置测试集进行公平评价。JRDB-Pose可在https://jrdb.erch.erash.monash.edu/。