We propose a novel Text-to-Image Generation Network, Adaptive Layout Refinement Generative Adversarial Network (ALR-GAN), to adaptively refine the layout of synthesized images without any auxiliary information. The ALR-GAN includes an Adaptive Layout Refinement (ALR) module and a Layout Visual Refinement (LVR) loss. The ALR module aligns the layout structure (which refers to locations of objects and background) of a synthesized image with that of its corresponding real image. In ALR module, we proposed an Adaptive Layout Refinement (ALR) loss to balance the matching of hard and easy features, for more efficient layout structure matching. Based on the refined layout structure, the LVR loss further refines the visual representation within the layout area. Experimental results on two widely-used datasets show that ALR-GAN performs competitively at the Text-to-Image generation task.
翻译:我们提出一种新颖的文本到图像生成网络,自适应布局细化生成对抗网络(ALR-GAN),可以自适应地细化合成图像的布局,而不需要任何辅助信息。ALR-GAN包括自适应布局细化(ALR)模块和布局视觉细化(LVR)损失。ALR模块将合成图像的布局结构(指对象和背景的位置)与其相应的真实图像对齐。在ALR模块中,我们提出了一种自适应布局细化(ALR)损失来平衡硬特征和易特征的匹配,以更有效地匹配布局结构。基于精细化的布局结构,LVR损失进一步精细化布局区域内的视觉表示。两个广泛使用的数据集上的实验结果表明,ALR-GAN在文本到图像生成任务上具有竞争力。