Image-to-image translation models have shown remarkable ability on transferring images among different domains. Most of existing work follows the setting that the source domain and target domain keep the same at training and inference phases, which cannot be generalized to the scenarios for translating an image from an unseen domain to another unseen domain. In this work, we propose the Unsupervised Zero-Shot Image-to-image Translation (UZSIT) problem, which aims to learn a model that can translate samples from image domains that are not observed during training. Accordingly, we propose a framework called ZstGAN: By introducing an adversarial training scheme, ZstGAN learns to model each domain with domain-specific feature distribution that is semantically consistent on vision and attribute modalities. Then the domain-invariant features are disentangled with an shared encoder for image generation. We carry out extensive experiments on CUB and FLO datasets, and the results demonstrate the effectiveness of proposed method on UZSIT task. Moreover, ZstGAN shows significant accuracy improvements over state-of-the-art zero-shot learning methods on CUB and FLO.
翻译:图像到图像翻译模型显示了在不同领域之间传送图像的非凡能力。 大部分现有工作是在源域和目标域在培训和推论阶段保持相同的设置之后进行的, 不能将其推广到将图像从无形域转换为另一个未知域的设想中。 在此工作中, 我们提议了无监督零热图像到图像翻译( UZSIT)问题, 目的是学习一个能够从培训期间未观察到的图像域中翻译样本的模型。 因此, 我们提议了一个称为 ZstGAN 的框架: 通过引入对抗性培训计划, ZstGAN 学会以在视觉和属性模式上具有内在一致性的域特性分布模式为每个域的模式。 然后, 域变量特性与图像生成的共同编码器脱钩。 我们对 CUB 和 FLO数据集进行了广泛的实验, 其结果显示了 UZSIT 任务的拟议方法的有效性。 此外, ZstGAN 显示, 相对于州- 艺术零光学习法和 FLO CLO 方法的精确度显著提高。