Existing 3D human pose estimation algorithms trained on distortion-free datasets suffer performance drop when applied to new scenarios with a specific camera distortion. In this paper, we propose a simple yet effective model for 3D human pose estimation in video that can quickly adapt to any distortion environment by utilizing MAML, a representative optimization-based meta-learning algorithm. We consider a sequence of 2D keypoints in a particular distortion as a single task of MAML. However, due to the absence of a large-scale dataset in a distorted environment, we propose an efficient method to generate synthetic distorted data from undistorted 2D keypoints. For the evaluation, we assume two practical testing situations depending on whether a motion capture sensor is available or not. In particular, we propose Inference Stage Optimization using bone-length symmetry and consistency. Extensive evaluation shows that our proposed method successfully adapts to various degrees of distortion in the testing phase and outperforms the existing state-of-the-art approaches. The proposed method is useful in practice because it does not require camera calibration and additional computations in a testing set-up.
翻译:在本文中,我们提出了一个简单而有效的视频3D人构成估计模型,该模型可以通过使用具有代表性的优化模型,即基于代表性的元学习算法,迅速适应任何扭曲环境。我们将特定扭曲的2D关键点序列视为MAML的单项任务。然而,由于在扭曲环境中没有大规模数据集,我们提出了一个高效的方法,从未扭曲的 2D 关键点生成合成扭曲数据。在评估中,我们假设两种实际测试情形,取决于是否具备运动捕捉传感器。特别是,我们提议使用骨骼长度的对称和一致性来推断最佳化阶段。广泛的评估表明,我们拟议的方法成功地适应了测试阶段不同程度的扭曲,并超越了现有的最新方法。拟议方法在实践上是有用的,因为它不需要在测试设置中进行相机校准和额外计算。