We have recently seen great progress in building photorealistic animatable full-body codec avatars, but generating high-fidelity animation of clothing is still difficult. To address these difficulties, we propose a method to build an animatable clothed body avatar with an explicit representation of the clothing on the upper body from multi-view captured videos. We use a two-layer mesh representation to register each 3D scan separately with the body and clothing templates. In order to improve the photometric correspondence across different frames, texture alignment is then performed through inverse rendering of the clothing geometry and texture predicted by a variational autoencoder. We then train a new two-layer codec avatar with separate modeling of the upper clothing and the inner body layer. To learn the interaction between the body dynamics and clothing states, we use a temporal convolution network to predict the clothing latent code based on a sequence of input skeletal poses. We show photorealistic animation output for three different actors, and demonstrate the advantage of our clothed-body avatars over the single-layer avatars used in previous work. We also show the benefit of an explicit clothing model that allows the clothing texture to be edited in the animation output.
翻译:我们最近看到在建立光现实的全体识辨码器方面取得了巨大进展,但是仍然难以产生高纤维服装动画。为了解决这些困难,我们建议了一种方法,用多视图拍摄的视频来建造一个可塑的穿衣体动画板。我们用一个双层网格显示器分别与身体和服装模板进行3D扫描。为了改进不同框架的光度对应,然后通过一个变异自动电解码器预测的服装几何和纹理的反翻版进行纹理调整。我们然后训练一个新的双层码动画板,将上衣和内体层分别建模。要了解上体动态和服装状态之间的相互作用,我们用一个时间变动网络来根据输入骨骼的顺序来预测服装潜值代码。我们为3个不同的演员展示了光现实动画输出,并展示了我们穿衣的体对单层纹理的图案的优势。我们用过前工作时的制动画图也使得一个清晰的造型产品得到修改。