Every recent image-to-image translation model inherently requires either image-level (i.e. input-output pairs) or set-level (i.e. domain labels) supervision. However, even set-level supervision can be a severe bottleneck for data collection in practice. In this paper, we tackle image-to-image translation in a fully unsupervised setting, i.e., neither paired images nor domain labels. To this end, we propose a truly unsupervised image-to-image translation model (TUNIT) that simultaneously learns to separate image domains and translates input images into the estimated domains. Experimental results show that our model achieves comparable or even better performance than the set-level supervised model trained with full labels, generalizes well on various datasets, and is robust against the choice of hyperparameters (e.g. the preset number of pseudo domains). Furthermore, TUNIT can be easily extended to semi-supervised learning with a few labeled data.
翻译:每一个最近的图像到图像翻译模式都必然需要图像级别(即输入-输出对配)或设定级别(即域名标签)监督。 但是,即使是设定级别监督实际上也可能是数据收集的严重瓶颈。 在本文中,我们在完全不受监督的环境中处理图像到图像翻译,即既不配对图像,也没有域名标签。 为此,我们提议建立一个真正不受监督的图像到图像翻译模式(TUNIT),该模式同时学习将图像区域分开,并将输入图像转换到估计区域。 实验结果显示,我们的模型的性能比在全标签下训练的设定级别监督模式要好, 在所有数据集上非常精辟, 并且与选择双参数(例如,伪域的预设数) 相抗力。 此外, TUNIT可以很容易地扩展为使用少数标签数据进行半超强的学习。