We propose a Transformer-based framework for 3D human texture estimation from a single image. The proposed Transformer is able to effectively exploit the global information of the input image, overcoming the limitations of existing methods that are solely based on convolutional neural networks. In addition, we also propose a mask-fusion strategy to combine the advantages of the RGB-based and texture-flow-based models. We further introduce a part-style loss to help reconstruct high-fidelity colors without introducing unpleasant artifacts. Extensive experiments demonstrate the effectiveness of the proposed method against state-of-the-art 3D human texture estimation approaches both quantitatively and qualitatively.
翻译:我们建议从单一图像中为3D人类质素估计提供一个基于变异器的框架。 拟议的变异器能够有效地利用输入图像的全球信息,克服仅以进化神经网络为基础的现有方法的局限性。 此外,我们还提议了一项掩码融合战略,将基于RGB和基于质流模型的优势结合起来。 我们还引入了一种半式损失,以帮助重建高纤维颜色,而不会引入不愉快的手工艺品。 广泛的实验表明,拟议的方法在质和量两方面都能够有效打击最先进的3D人类质素估计方法。