We present a learning-based method for building driving-signal aware full-body avatars. Our model is a conditional variational autoencoder that can be animated with incomplete driving signals, such as human pose and facial keypoints, and produces a high-quality representation of human geometry and view-dependent appearance. The core intuition behind our method is that better drivability and generalization can be achieved by disentangling the driving signals and remaining generative factors, which are not available during animation. To this end, we explicitly account for information deficiency in the driving signal by introducing a latent space that exclusively captures the remaining information, thus enabling the imputation of the missing factors required during full-body animation, while remaining faithful to the driving signal. We also propose a learnable localized compression for the driving signal which promotes better generalization, and helps minimize the influence of global chance-correlations often found in real datasets. For a given driving signal, the resulting variational model produces a compact space of uncertainty for missing factors that allows for an imputation strategy best suited to a particular application. We demonstrate the efficacy of our approach on the challenging problem of full-body animation for virtual telepresence with driving signals acquired from minimal sensors placed in the environment and mounted on a VR-headset.
翻译:我们展示了一种基于学习的方法,用于建立驾驶信号,了解全身体动变形。我们的模型是一个有条件的变换自动编码器,可以用不完全的驾驶信号(如人脸和面部关键点)来动画,并产生高质量的人类几何和视观外观代表。我们的方法背后的核心直觉是,通过分解动动画期间无法提供的驱动信号和保持的基因化因素,可以实现更好的机动性和普遍性。为此,我们明确说明驾驶信号中的信息不足,为此,我们引入了一个专门捕捉剩余信息的潜在空间,从而能够估算出全体动画期间所需的缺失因素,同时仍然忠实于驱动信号。我们还提议对驱动信号进行可学习的局部压缩,以促进更普遍的化,并有助于最大限度地减少真实数据集中经常发现的全球机会-关系的影响。对于给定的驱动信号来说,由此产生的变异模型为缺失因素提供了一个紧凑的不确定空间,从而使得一个完全适合特定应用的浸泡战略得以实现,从而使得能够对全体动动动画期间所需的缺失因素进行估算,同时保持忠实。我们还提议对驱动器信号进行局部感官动动动感感仪,以最起码的移动感官。