The Swapping Autoencoder achieved state-of-the-art performance in deep image manipulation and image-to-image translation. We improve this work by introducing a simple yet effective auxiliary module based on gradient reversal layers. The auxiliary module's loss forces the generator to learn to reconstruct an image with an all-zero texture code, encouraging better disentanglement between the structure and texture information. The proposed attribute-based transfer method enables refined control in style transfer while preserving structural information without using a semantic mask. To manipulate an image, we encode both the geometry of the objects and the general style of the input images into two latent codes with an additional constraint that enforces structure consistency. Moreover, due to the auxiliary loss, training time is significantly reduced. The superiority of the proposed model is demonstrated in complex domains such as satellite images where state-of-the-art are known to fail. Lastly, we show that our model improves the quality metrics for a wide range of datasets while achieving comparable results with multi-modal image generation techniques.
翻译:Swapping Autoencoder 在深层图像操纵和图像到图像翻译中实现了最先进的性能。 我们通过引入一个基于梯度反转层的简单而有效的辅助模块来改进这项工作。 辅助模块的丢失迫使生成器学习以全零纹理代码重建图像, 从而鼓励结构和纹理信息之间更好的分解。 拟议的基于属性的传输方法可以在不使用语义掩码的情况下保存结构信息的风格传输中进行精细的控制。 为了对图像进行操控, 我们将对象的几何和输入图像的一般风格都编码成两个隐含代码, 并附加了强制结构一致性的限制。 此外, 由于辅助性损失, 培训时间大大缩短了 。 拟议模型的优越性表现在复杂的领域, 如卫星图像, 即已知的状态艺术失败了。 最后, 我们展示了我们的模型可以改进一系列数据集的质量衡量标准, 同时以多模式生成技术取得可比的结果 。