The manipulation of latent space has recently become an interesting topic in the field of generative models. Recent research shows that latent directions can be used to manipulate images towards certain attributes. However, controlling the generation process of 3D generative models remains a challenge. In this work, we propose a novel 3D manipulation method that can manipulate both the shape and texture of the model using text or image-based prompts such as 'a young face' or 'a surprised face'. We leverage the power of Contrastive Language-Image Pre-training (CLIP) model and a pre-trained 3D GAN model designed to generate face avatars, and create a fully differentiable rendering pipeline to manipulate meshes. More specifically, our method takes an input latent code and modifies it such that the target attribute specified by a text or image prompt is present or enhanced, while leaving other attributes largely unaffected. Our method requires only 5 minutes per manipulation, and we demonstrate the effectiveness of our approach with extensive results and comparisons.
翻译:对潜层空间的操纵最近已成为基因模型领域一个有趣的话题。 最近的研究显示,潜在方向可以用来操纵图像到某些属性。 但是,控制3D基因模型的生成过程仍是一个挑战。 在这项工作中,我们提议了一种新的3D操纵方法,它既可以操纵模型的形状和纹理,也可以使用文字或图像提示,例如“年轻脸孔”或“惊讶脸孔”等图像提示。我们利用了相互抵触的语言预培训模型和预先训练的3DGAN模型的力量,这些模型旨在产生面对面的图像,并创建一个完全不同的化导管来操纵 meshes。更具体地说,我们的方法采用一种输入潜在代码,并修改它,使文本或图像提示所指定的目标属性存在或增强,同时使其他属性基本上不受影响。我们的方法只需要每次操作5分钟,我们用广泛的结果和比较来展示我们的方法的有效性。