We introduce a memory-driven semi-parametric approach to text-to-image generation, which is based on both parametric and non-parametric techniques. The non-parametric component is a memory bank of image features constructed from a training set of images. The parametric component is a generative adversarial network. Given a new text description at inference time, the memory bank is used to selectively retrieve image features that are provided as basic information of target images, which enables the generator to produce realistic synthetic results. We also incorporate the content information into the discriminator, together with semantic features, allowing the discriminator to make a more reliable prediction. Experimental results demonstrate that the proposed memory-driven semi-parametric approach produces more realistic images than purely parametric approaches, in terms of both visual fidelity and text-image semantic consistency.
翻译:我们采用了一种以记忆驱动的半参数方法来生成文本到图像,这种方法以参数和非参数技术为基础。非参数部分是一组培训图像所建图像的记忆库。参数部分是一个基因对抗网络。考虑到推论时间的新文字描述,记忆库被用来有选择地检索作为目标图像基本信息提供的图像特征,使生成器能够产生现实的合成结果。我们还将内容信息与语义特征一起纳入分析器中,使分析器能够作出更可靠的预测。实验结果表明,拟议的记忆驱动半参数方法在视觉真实性和文字模棱两可语义一致性两方面都产生比纯参数性方法更现实的图像。