Domain adaptation of 3D portraits has gained more and more attention.However, the transfer mechanism of existing methods is mainly based on vision or language, which ignores the potential of vision-language combined guidance. In this paper, we propose an Image-Text coupled 3D portraits domain adaptation framework, namely Image and Text portrait (ITportrait). ITportrait relies on a two-stage alternating training strategy. In the first stage, we employ a 3D Artistic Paired Transfer (APT) method for image-guided style transfer. APT constructs paired photo-realistic portraits to obtain accurate artistic poses, which helps ITportrait to achieve high-quality 3D style transfer. In the second stage, we propose a 3D Image-Text Embedding (ITE) approach in the CLIP space. ITE uses a threshold function to adaptively control the optimization direction of image or text in the CLIP space. Comprehensive quantitative and qualitative results show that our ITportrait achieves state-of-the-art (SOTA) results and benefits downstream tasks. All source codes and pre-trained models will be released to the public.
翻译:暂无翻译