We explore novel approaches to the task of image generation from their respective captions, building on state-of-the-art GAN architectures. Particularly, we baseline our models with the Attention-based GANs that learn attention mappings from words to image features. To better capture the features of the descriptions, we then built a novel cyclic design that learns an inverse function to maps the image back to original caption. Additionally, we incorporated recently developed BERT pretrained word embeddings as our initial text featurizer and observe a noticeable improvement in qualitative and quantitative performance compared to the Attention GAN baseline.
翻译:我们从各自的字幕中探索新的方法来完成图像生成任务,以最新的GAN结构为基础。特别是,我们用基于关注的GAN模型为模型基线,从文字到图像特征来学习关注绘图。为了更好地捕捉描述的特征,我们随后建立了一个新颖的循环设计,学会了将图像映射回原始字幕的反函数。此外,我们将最近开发的BERT预先培训的文字嵌入作为我们最初的文本编译器,并观察到与GAN关注基线相比,质量和数量绩效有了显著改善。