Drawing and annotating comic illustrations is a complex and difficult process. No existing machine learning algorithms have been developed to create comic illustrations based on descriptions of illustrations, or the dialogue in comics. Moreover, it is not known if a generative adversarial network (GAN) can generate original comics that correspond to the dialogue and/or descriptions. GANs are successful in producing photo-realistic images, but this technology does not necessarily translate to generation of flawless comics. What is more, comic evaluation is a prominent challenge as common metrics such as Inception Score will not perform comparably, as they are designed to work on photos. In this paper: 1. We implement ComicGAN, a novel text-to-comic pipeline based on a text-to-image GAN that synthesizes comics according to text descriptions. 2. We describe an in-depth empirical study of the technical difficulties of comic generation using GAN's. ComicGAN has two novel features: (i) text description creation from labels via permutation and augmentation, and (ii) custom image encoding with Convolutional Neural Networks. We extensively evaluate the proposed ComicGAN in two scenarios, namely image generation from descriptions, and image generation from dialogue. Our results on 1000 Dilbert comic panels and 6000 descriptions show synthetic comic panels from text inputs resemble original Dilbert panels. Novel methods for text description creation and custom image encoding brought improvements to Frechet Inception Distance, detail, and overall image quality over baseline algorithms. Generating illustrations from descriptions provided clear comics including characters and colours that were specified in the descriptions.
翻译:绘图和注解漫画是一个复杂而困难的过程。 没有现有机器学习算法能够根据图解描述或漫画对话来创建漫画图解。 此外,尚不清楚一个基因化对抗网络(GAN)能否产生与对话和/或描述相对应的原始漫画。 GAN成功制作了摄影现实图像,但这一技术并不一定转化为无瑕疵的漫画。 此外,漫画评估是一个突出的挑战,因为像Inption Cord这样的通用指标性指标不会产生可比较的可比较性,因为它们是设计用于图片的。在本文件中:1. 我们实施了ComicGAN,这是基于文本到图像的新型文本-图像网络(GAN),根据文本描述合成漫画。 2. 我们描述了对使用GAN生成漫画的技术困难的深入经验研究。 ComiGAN有两个新特征:(i) 通过调和放大标签创建的文本描述,以及(ii) 与Conlical Neal Neal 图像的定制化图解调, 包括由60种图像生成的Comial-rical 网络生成结果。我们广泛评估了“Dal ” 和“Dlicomplal ” 的原始图解算。我们从“Cal ” 和“Ocial ” 图像” 和“Ocialal 生成”中提出了“Odealdealal 和“Odealal” 格式” 生成“20” 和“Odeal” 格式” 格式” 模型“O” 的清晰化图解调制成”。