Image-to-image translation aims to learn a mapping between different groups of visually distinguishable images. While recent methods have shown impressive ability to change even intricate appearance of images, they still rely on domain labels in training a model to distinguish between distinct visual features. Such dependency on labels often significantly limits the scope of applications since consistent and high-quality labels are expensive. Instead, we wish to capture visual features from images themselves and apply them to enable realistic translation without human-generated labels. To this end, we propose an unsupervised image-to-image translation method based on contrastive learning. The key idea is to learn a discriminator that differentiates between distinctive styles and let the discriminator supervise a generator to transfer those styles across images. During training, we randomly sample a pair of images and train the generator to change the appearance of one towards another while keeping the original structure. Experimental results show that our method outperforms the leading unsupervised baselines in terms of visual quality and translation accuracy.
翻译:图像到图像翻译的目的是在不同的视觉可辨别图像组间进行绘图。 虽然最近的方法显示能够改变图像的复杂外观, 令人印象深刻, 但是它们仍然依靠域名来训练一种模型来区分不同的视觉特征。 这种对标签的依赖性往往极大地限制了应用范围, 因为一致和高质量的标签成本很高。 相反, 我们希望从图像本身中捕获视觉特征, 并应用这些特征来使得现实的翻译无需人造标签。 为此, 我们提议了一种基于对比性学习的未经监督的图像到图像翻译方法。 关键的想法是学习一个区分不同风格的区分器, 让歧视者监督一个生成器将这些样式跨越图像。 在培训过程中, 我们随机抽样一组图像, 并训练生成器在保持原始结构的同时将一个图像的外观改变为另一个图像。 实验结果显示, 我们的方法在视觉质量和翻译准确性方面超越了领先的、 不受监督的基线 。