Human pose estimation (HPE) with convolutional neural networks (CNNs) for indoor monitoring is one of the major challenges in computer vision. In contrast to HPE in perspective views, an indoor monitoring system can consist of an omnidirectional camera with a field of view of 180{\deg} to detect the pose of a person with only one sensor per room. To recognize human pose, the detection of keypoints is an essential upstream step. In our work we propose a new dataset for training and evaluation of CNNs for the task of keypoint detection in omnidirectional images. The training dataset, THEODORE+, consists of 50,000 images and is created by a 3D rendering engine, where humans are randomly walking through an indoor environment. In a dynamically created 3D scene, persons move randomly with simultaneously moving omnidirectional camera to generate synthetic RGB images and 2D and 3D ground truth. For evaluation purposes, the real-world PoseFES dataset with two scenarios and 701 frames with up to eight persons per scene was captured and annotated. We propose four training paradigms to finetune or re-train two top-down models in MMPose and two bottom-up models in CenterNet on THEODORE+. Beside a qualitative evaluation we report quantitative results. Compared to a COCO pretrained baseline, we achieve significant improvements especially for top-view scenes on the PoseFES dataset. Our datasets can be found at https://www.tu-chemnitz.de/etit/dst/forschung/comp_vision/datasets/index.php.en.
翻译:人类姿态估计 (HPE) 是室内监控计算机视觉的重要挑战之一。与透视图中的 HPE 相比,室内监控系统可以由一个全向相机组成,其视野为 180{\deg},只需一个传感器即可检测人的姿态。为了识别人的姿态,关键点检测是一个必要的上游步骤。在我们的工作中,我们提出了一个新的用于训练和评估 CNN 的数据集,用于全向图像中的关键点检测的任务。训练数据集 THEODORE+ 包含 50,000 张图像,由 3D 渲染引擎创建,其中人们在室内环境中随机行走。在动态创建的 3D 场景中,人们随机移动,同时随着全向相机的移动生成合成的 RGB 图像和 2D/3D 的 Ground truth(真实值)。为了评估目的,捕获了有两个场景和 701 帧的 PoseFES 真实世界数据集,并进行了注释。我们提出了四个训练模式,在 THEODORE+ 上微调或重新训练 MMPose 中的两个自顶向下模型和 CenterNet 中的两个自底向上模型。除了定性评估外,我们还报告了定量结果。与 COCO 预训练基线相比,我们在 PoseFES 数据集中特别是在俯视场景中实现了显着的改进。我们的数据集可以在 https://www.tu-chemnitz.de/etit/dst/forschung/comp_vision/datasets/index.php.en 找到。