Both generative learning and discriminative learning have recently witnessed remarkable progress using Deep Neural Networks (DNNs). For structured input synthesis and structured output prediction problems (e.g., layout-to-image synthesis and image semantic segmentation respectively), they often are studied separately. This paper proposes deep consensus learning (DCL) for joint layout-to-image synthesis and weakly-supervised image semantic segmentation. The former is realized by a recently proposed LostGAN approach, and the latter by introducing an inference network as the third player joining the two-player game of LostGAN. Two deep consensus mappings are exploited to facilitate training the three networks end-to-end: Given an input layout (a list of object bounding boxes), the generator generates a mask (label map) and then use it to help synthesize an image. The inference network infers the mask for the synthesized image. Then, the latent consensus is measured between the mask generated by the generator and the one inferred by the inference network. For the real image corresponding to the input layout, its mask also is computed by the inference network, and then used by the generator to reconstruct the real image. Then, the data consensus is measured between the real image and its reconstructed image. The discriminator still plays the role of an adversary by computing the realness scores for a real image, its reconstructed image and a synthesized image. In experiments, our DCL is tested in the COCO-Stuff dataset. It obtains compelling layout-to-image synthesis results and weakly-supervised image semantic segmentation results.
翻译:基因学习和歧视性学习最近都通过深神经网络(DNNS)取得了显著进展。对于结构化输入合成和结构化输出预测问题(例如,结构化的图像合成和图像语义分解),往往分开研究。本文件建议为联合布局到图像合成和低监控图像语义分割而进行深度共识学习(DCL),前者是通过最近提议的LostGAN方法实现的,后者则通过引入一个推断网络作为第三个参与者加入LostGAN双玩游戏。两种深层次的共识绘图被用于为三个网络的终端到终端培训(例如,布局到图像合成和图像语义分解):鉴于一个输入布局(一个目标捆绑框列表),发电机制作了一个掩码(标签图),然后用来帮助合成图像的合成。然后,通过最近提出的“LostGAN”方法测量了发电机生成的掩码与当时的导测图案之间的潜在共识。对于与输入布局对应的真实图像而言,它的掩码也用它的掩码面面图解了三个网络的图像的图像的图像, 。在后期重建一个真实的图像中, 重建了真实的图像,一个真实的图像中,一个真实的图像,一个真实的图像,一个真实的图像和图像,一个真实的图像,一个真实的缩化的图像,一个真实的缩化的缩化的缩化的图像是用来了它的图像的图像, 用来计算。