Capturing challenging human motions is critical for numerous applications, but it suffers from complex motion patterns and severe self-occlusion under the monocular setting. In this paper, we propose ChallenCap -- a template-based approach to capture challenging 3D human motions using a single RGB camera in a novel learning-and-optimization framework, with the aid of multi-modal references. We propose a hybrid motion inference stage with a generation network, which utilizes a temporal encoder-decoder to extract the motion details from the pair-wise sparse-view reference, as well as a motion discriminator to utilize the unpaired marker-based references to extract specific challenging motion characteristics in a data-driven manner. We further adopt a robust motion optimization stage to increase the tracking accuracy, by jointly utilizing the learned motion details from the supervised multi-modal references as well as the reliable motion hints from the input image reference. Extensive experiments on our new challenging motion dataset demonstrate the effectiveness and robustness of our approach to capture challenging human motions.
翻译:获取具有挑战性的人类运动对于许多应用来说至关重要,但是它受到复杂的运动模式和在单体环境下自我隔离的严重影响。在本文件中,我们提议CharllenCap -- -- 一种基于模板的方法,在新的学习和优化框架内,在多模式参考的辅助下,使用一个全新的学习和优化框架中,用一个RGB相机来捕捉具有挑战性的3D人类运动。我们提议与一个新一代网络一起进行混合运动推论阶段,利用一个时间编码器分解器来从对称稀释参考中提取运动细节。我们提议使用一个运动分辨器来利用未标码的引用,以数据驱动的方式提取具有挑战性的具体运动特征。我们进一步采用一个强有力的运动优化阶段,通过共同利用从监督的多模式参考中学到的运动细节以及从输入图像参考中获得的可靠运动提示来提高跟踪的准确性。关于我们新的具有挑战性的运动数据集的大规模实验表明我们捕捉具有挑战性的人类运动动作的方法的有效性和稳健性。