Recent work has shown great progress in building photorealistic animatable full-body codec avatars, but these avatars still face difficulties in generating high-fidelity animation of clothing. To address the difficulties, we propose a method to build an animatable clothed body avatar with an explicit representation of the clothing on the upper body from multi-view captured videos. We use a two-layer mesh representation to separately register the 3D scans with templates. In order to improve the photometric correspondence across different frames, texture alignment is then performed through inverse rendering of the clothing geometry and texture predicted by a variational autoencoder. We then train a new two-layer codec avatar with separate modeling of the upper clothing and the inner body layer. To learn the interaction between the body dynamics and clothing states, we use a temporal convolution network to predict the clothing latent code based on a sequence of input skeletal poses. We show photorealistic animation output for three different actors, and demonstrate the advantage of our clothed-body avatars over single-layer avatars in the previous work. We also show the benefit of an explicit clothing model which allows the clothing texture to be edited in the animation output.
翻译:最近的工作显示,在建立光现实的全体全体编码方面,取得了巨大的进展,但是这些变异体在生成高纤维化服装动画方面仍然面临困难。 为了解决这些困难, 我们建议了一种方法, 用多视图摄取的视频来构建一个可塑的有衣体成形体成形体成形仪。 我们用一个双层网格来分别用模板来对3D扫描进行注册。 为了改进不同框架的光度对应, 质谱对齐之后, 还要通过一个变异自动编码器预测的服装几何和纹理的反翻版来进行。 然后我们训练一个新的双层成形成形成形成形成形成形成像仪, 将上层服装和内体层分别建模成型成型成型。 为了了解上部的动态和服装状态之间的相互作用, 我们用一个时间变动图网络来预测基于输入方形成型序列的服装潜伏代码。 我们为3个不同的演员展示了光真实的动画输出, 并展示了我们上身体的成形体在单层成型成型成型成型时的优势。 我们还展示了前的成型成型的成型中, 使前的制成型能够对制成型的造型的造型图图。