AI illustrator aims to automatically design visually appealing images for books to provoke rich thoughts and emotions. To achieve this goal, we propose a framework for translating raw descriptions with complex semantics into semantically corresponding images. The main challenge lies in the complexity of the semantics of raw descriptions, which may be hard to be visualized (\textit{e}.\textit{g}., "gloomy" or "Asian"). It usually poses challenges for existing methods to handle such descriptions. To address this issue, we propose a \textbf{P}rompt-based \textbf{C}ross-\textbf{M}odal Generation \textbf{Frame}work (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN. Our framework consists of two components: a projection module from \textit{Text Embedding}s to \textit{Image Embedding}s based on prompts, and an adapted image generation module built on StyleGAN which takes \textit{Image Embedding}s as inputs and is trained by combined semantic consistency losses. To bridge the gap between realistic images and illustration designs, we further adopt a stylization model as post-processing in our framework for better visual effects. Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training. Furthermore, we have built a benchmark that consists of 200 raw descriptions. We conduct a user study to demonstrate our superiority over the competing methods with complicated texts. We release our code at https://github.com/researchmm/AI\_Illustrator}{https://github.com/researchmm/AI\_Illustrator
翻译:AI 插图旨在自动为书籍设计有视觉吸引力的图像, 以激起丰富的思想和情绪。 为了实现这一目标, 我们提出一个框架, 将具有复杂语义的原始描述转换成语义对应的图像。 主要的挑战在于原始描述的语义的复杂性, 它可能很难被视觉化(\ textit{ e}.\ textit{g}., “ 阴暗” 或“ 亚洲 ” ) 。 它通常会给处理这些描述的现有方法带来挑战 。 为了解决这个问题, 我们提议了一个基于 textbcomf{ P} 的原始描述 。 基于\ textbf{ C} textb{ M} odal Diseault \ textbf{Fum}work (PCM- Frameme) 。 主要挑战在于原始描述的语义描述, 包括 CLIP 和 StyGAN 。 我们的框架包括两个组成部分 : 从 文本/ Text Embendding 模型到 ligistration 版本 版本 版本 版本 版本 版本 版本 。 和 版本 版本 演示后, 我们的模型需要 通过 演示的 演示的 演示的文本 。