Image-to-image translation aims to preserve source contents while translating to discriminative target styles between two visual domains. Most works apply adversarial learning in the ambient image space, which could be computationally expensive and challenging to train. In this paper, we propose to deploy an energy-based model (EBM) in the latent space of a pretrained autoencoder for this task. The pretrained autoencoder serves as both a latent code extractor and an image reconstruction worker. Our model is based on the assumption that two domains share the same latent space, where latent representation is implicitly decomposed as a content code and a domain-specific style code. Instead of explicitly extracting the two codes and applying adaptive instance normalization to combine them, our latent EBM can implicitly learn to transport the source style code to the target style code while preserving the content code, which is an advantage over existing image translation methods. This simplified solution also brings us far more efficiency in the one-sided unpaired image translation setting. Qualitative and quantitative comparisons demonstrate superior translation quality and faithfulness for content preservation. To the best of our knowledge, our model is the first to be applicable to 1024$\times$1024-resolution unpaired image translation.
翻译:图像到图像翻译的目的是保存源内容,同时将源内容转换为两个视觉域之间的歧视性目标样式。 多数工作在周围图像空间应用对抗性学习, 这可能是计算成本高, 培训难度很大。 在本文中, 我们提议在事先训练的自动编码器的潜在空间中部署一个基于能源的模型( EBM ), 用于此任务。 预先训练的自动编码器既是潜在的代码提取器,又是图像重建工人。 我们的模型基于这样的假设: 两个域共享相同的潜在空间, 其潜在代表面被隐含地分解成内容代码和特定域样式代码。 我们的隐含 EBM 能够隐含地将源样式代码传输到目标样式代码中, 而同时保留内容代码, 这是现有图像翻译方法的优势。 这个简化的解决方案还让我们在片面的图像翻译设置中效率更高得多。 质化和定量比较表明, 潜在代表的翻译质量和内容保存的准确性是优异性。 对于我们的知识来说, 我们的模型是第一个应用到10美元分辨率的翻译。