We propose an end-to-end pipeline for both building and tracking 3D facial models from personalized in-the-wild (cellphone, webcam, youtube clips, etc.) video data. First, we present a method for automatic data curation and retrieval based on a hierarchical clustering framework typical of collision detection algorithms in traditional computer graphics pipelines. Subsequently, we utilize synthetic turntables and leverage deepfake technology in order to build a synthetic multi-view stereo pipeline for appearance capture that is robust to imperfect synthetic geometry and image misalignment. The resulting model is fit with an animation rig, which is then used to track facial performances. Notably, our novel use of deepfake technology enables us to perform robust tracking of in-the-wild data using differentiable renderers despite a significant synthetic-to-real domain gap. Finally, we outline how we train a motion capture regressor, leveraging the aforementioned techniques to avoid the need for real-world ground truth data and/or a high-end calibrated camera capture setup.
翻译:首先,我们提出了一个基于等级分组框架的自动数据整理和检索方法。 典型的是一种传统计算机图形管道中碰撞检测算法的典型特征。随后,我们利用合成转盘并利用深假技术来建造一个合成多视立体管道,用于外观捕捉,该管道对不完善的合成几何和图像不匹配十分有力。由此产生的模型适合动画钻机,然后用于跟踪面部表现。值得注意的是,我们新颖的深假技术的使用使我们能够利用不同的合成到现实域差距,对电路数据进行强有力的跟踪。最后,我们概述了如何训练移动抓取回器,利用上述技术避免需要真实世界的地面事实数据和(或)高校准相机。