Semantically guided conditional Generative Adversarial Networks (cGANs) have become a popular approach for face editing in recent years. However, most existing methods introduce semantic masks as direct conditional inputs to the generator and often require the target masks to perform the corresponding translation in the RGB space. We propose SeCGAN, a novel label-guided cGAN for editing face images utilising semantic information without the need to specify target semantic masks. During training, SeCGAN has two branches of generators and discriminators operating in parallel, with one trained to translate RGB images and the other for semantic masks. To bridge the two branches in a mutually beneficial manner, we introduce a semantic consistency loss which constrains both branches to have consistent semantic outputs. Whilst both branches are required during training, the RGB branch is our primary network and the semantic branch is not needed for inference. Our results on CelebA and CelebA-HQ demonstrate that our approach is able to generate facial images with more accurate attributes, outperforming competitive baselines in terms of Target Attribute Recognition Rate whilst maintaining quality metrics such as self-supervised Fr\'{e}chet Inception Distance and Inception Score.
翻译:近些年来,SecGAN拥有两个发电机和导师分支,一个是翻译 RGB 图像,另一个是语言面具,另一个是语言面具。为了以互利的方式弥合这两个分支,我们引入了一种语义一致性损失,这限制了两个分支具有一致的语义输出。虽然在培训期间需要这两个分支,但这两个分支都是我们的主要网络,不需要用语义分支来推断。我们在CeebebA和CeebebA-HQ上取得的结果表明,我们的方法能够产生具有更准确的属性的面部图像,在目标自我识别和自我定位方面超越竞争性基线,同时保持自我定位和自我定位等质量指标。