Recent popular Role-Playing Games (RPGs) saw the great success of character auto-creation systems. The bone-driven face model controlled by continuous parameters (like the position of bones) and discrete parameters (like the hairstyles) makes it possible for users to personalize and customize in-game characters. Previous in-game character auto-creation systems are mostly image-driven, where facial parameters are optimized so that the rendered character looks similar to the reference face photo. This paper proposes a novel text-to-parameter translation method (T2P) to achieve zero-shot text-driven game character auto-creation. With our method, users can create a vivid in-game character with arbitrary text description without using any reference photo or editing hundreds of parameters manually. In our method, taking the power of large-scale pre-trained multi-modal CLIP and neural rendering, T2P searches both continuous facial parameters and discrete facial parameters in a unified framework. Due to the discontinuous parameter representation, previous methods have difficulty in effectively learning discrete facial parameters. T2P, to our best knowledge, is the first method that can handle the optimization of both discrete and continuous parameters. Experimental results show that T2P can generate high-quality and vivid game characters with given text prompts. T2P outperforms other SOTA text-to-3D generation methods on both objective evaluations and subjective evaluations.
翻译:最近的流行角色扮演游戏( RPGs) 看到了性格自动生成系统的巨大成功。 由连续参数( 如骨头的位置) 和离散参数( 如发型) 控制的骨质驱动面部模型使用户能够对游戏中的字符进行个性化和定制。 前游戏中的字符自动生成系统大多是图像驱动的, 面部参数最优化, 使变换字符看起来与参考面部照片相似。 本文建议一种新颖的文本到参数翻译方法( T2P ), 以实现零发文本驱动的游戏字符自动生成。 用我们的方法, 用户可以在游戏中创建生动的文本描述, 带有任意的文本描述, 而不使用任何参考照片或手工编辑数以百计参数。 在我们的方法中, 使用大规模预培训的多模式 CLIP 和 神经显示的能量, T2P 搜索连续的面部参数和离散面面面部参数。 由于不连续的参数代表, 以往的方法很难有效地学习离散的面面部参数。 T2P 根据我们的最佳知识, 用户可以在游戏中创建一个生质量的TP 格式, 和直态的文本格式上展示其他的图像, 。</s>