Synthesizing a realistic image from textual description is a major challenge in computer vision. Current text to image synthesis approaches falls short of producing a highresolution image that represent a text descriptor. Most existing studies rely either on Generative Adversarial Networks (GANs) or Variational Auto Encoders (VAEs). GANs has the capability to produce sharper images but lacks the diversity of outputs, whereas VAEs are good at producing a diverse range of outputs, but the images generated are often blurred. Taking into account the relative advantages of both GANs and VAEs, we proposed a new stacked Conditional VAE (CVAE) and Conditional GAN (CGAN) network architecture for synthesizing images conditioned on a text description. This study uses Conditional VAEs as an initial generator to produce a high-level sketch of the text descriptor. This high-level sketch output from first stage and a text descriptor is used as an input to the conditional GAN network. The second stage GAN produces a 256x256 high resolution image. The proposed architecture benefits from a conditioning augmentation and a residual block on the Conditional GAN network to achieve the results. Multiple experiments were conducted using CUB and Oxford-102 dataset and the result of the proposed approach is compared against state-ofthe-art techniques such as StackGAN. The experiments illustrate that the proposed method generates a high-resolution image conditioned on text descriptions and yield competitive results based on Inception and Frechet Inception Score using both datasets
翻译:从文本描述中合成现实图像是计算机视觉中的一大挑战。 当前的文本到图像合成方法没有产生代表文本描述的高分辨率图像。 多数现有研究都依赖于生成反反向网络( GANs ) 或动态自动编码器( VAEs ) 。 GANs 有能力生成更清晰的图像, 但缺乏产出的多样性, 而 VAEs 则擅长生成一系列不同的输出, 但生成的图像往往模糊不清。 考虑到 GANs 和 VAEs 的相对优势, 我们建议了一个新的堆叠式图像代表文本描述。 多数现有研究依靠文本描述来合成图像。 GANs 能够生成更清晰的图像, 但缺乏产出的多样性, 而 VAEs 则能够生成一个高层次的素描, 但生成的图像往往模糊不清。 考虑到 GANsion 和 VAEAEs 的相对竞争性描述值 VAEEEE( CVAEE) 和 Conditional Stateal 图像的第二个阶段, 将使用GAAN 和 CAVAL- GAVAL- Developmental 高级图像的计算方法, 进行了一个比较GAND 和 的图像, 。 在GAnneval Streal Stal 格式的模型中, 和拟议的模型中, 25I- GAVAL- GAMS 的计算中, 和拟议的计算中, 在使用了使用了使用了一种 和多式的模型的模型的模型的模型的计算法。