Given an input image from a source domain and a guidance image from a target domain, unsupervised many-to-many image-to-image (UMMI2I) translation methods seek to generate a plausible example from the target domain that preserves domain-invariant information of the input source image and inherits the domain-specific information from the guidance image. For example, when translating female faces to male faces, the generated male face should have the same expression, pose and hair color as the input female image, and the same facial hairstyle and other male-specific attributes as the guidance male image. Current state-of-the art UMMI2I methods generate visually pleasing images, but, since for most pairs of real datasets we do not know which attributes are domain-specific and which are domain-invariant, the semantic correctness of existing approaches has not been quantitatively evaluated yet. In this paper, we propose a set of benchmarks and metrics for the evaluation of semantic correctness of these methods. We provide an extensive study of existing state-of-the-art UMMI2I translation methods, showing that all methods, to different degrees, fail to infer which attributes are domain-specific and which are domain-invariant from data, and mostly rely on inductive biases hard-coded into their architectures.
翻译:根据源域的输入图像和目标域的指导图像,未经监督的多到多图像到图像(UMMI2I)翻译方法试图从目标领域产生一个可信的实例,以保存输入源图像的域变量信息,并继承指导图像中的具体域信息。例如,将女性面部转换为男性面部时,生成的男性脸部的表达方式、姿势和毛发颜色与输入的女性图像相同,面部发型和其他男性特有特征与指导男性图像相同。目前艺术状态的UMMI2I 方法产生视觉上令人愉快的图像,但是,由于大多数真实数据集的对大多数配对都不清楚哪些属性是特定域,哪些是域变量,哪些是域变量的准确性,哪些现有方法的语义正确性尚未进行定量评估。在本文件中,我们提出了一套评估这些方法的语义正确性基准和计量标准。我们广泛研究了现有艺术状态的UMMI2I 方法的当前状态翻译方法产生视觉上令人愉快的图像,但是,对于大多数真实数据集都从域域图解到不同的域域图层,哪些是不同的域图层,哪些是不同的域图层,哪些是不同的。