State-of-the-art image-to-image translation methods tend to struggle in an imbalanced domain setting, where one image domain lacks richness and diversity. We introduce a new unsupervised translation network, BalaGAN, specifically designed to tackle the domain imbalance problem. We leverage the latent modalities of the richer domain to turn the image-to-image translation problem, between two imbalanced domains, into a balanced, multi-class, and conditional translation problem, more resembling the style transfer setting. Specifically, we analyze the source domain and learn a decomposition of it into a set of latent modes or classes, without any supervision. This leaves us with a multitude of balanced cross-domain translation tasks, between all pairs of classes, including the target domain. During inference, the trained network takes as input a source image, as well as a reference or style image from one of the modes as a condition, and produces an image which resembles the source on the pixel-wise level, but shares the same mode as the reference. We show that employing modalities within the dataset improves the quality of the translated images, and that BalaGAN outperforms strong baselines of both unconditioned and style-transfer-based image-to-image translation methods, in terms of image quality and diversity.
翻译:最先进的图像到图像翻译方法往往在不平衡的域环境中挣扎, 一个图像域缺乏丰富性和多样性。 我们引入了一个新的不受监督的翻译网络 BalaGAN, 专门用来解决域失衡问题。 我们利用较富裕域的潜在模式, 将图像到图像翻译问题, 在两个不平衡的域之间, 变成一个平衡、 多级和有条件的翻译问题, 更类似于风格传输设置。 具体地说, 我们分析源域, 并学习将其分解成一组潜在模式或类, 没有任何监督。 这给我们留下了一系列平衡的跨域翻译任务, 包括目标域。 在推断过程中, 受过训练的网络作为输入源图像, 以及一个模式的参考或风格图像, 作为一种条件, 产生一种类似于像像像样传输源一样的图像, 但与引用方式相同 。 我们显示, 在数据集中采用模式, 提高了翻译图像的翻译质量, 包括目标域。 并且 BalaGANPRO 将图像转换为强大的标准 。