Unsupervised image-to-image translation methods aim to map images from one domain into plausible examples from another domain while preserving structures shared across two domains. In the many-to-many setting, an additional guidance example from the target domain is used to determine domain-specific attributes of the generated image. In the absence of attribute annotations, methods have to infer which factors are specific to each domain from data during training. Many state-of-art methods hard-code the desired shared-vs-specific split into their architecture, severely restricting the scope of the problem. In this paper, we propose a new method that does not rely on such inductive architectural biases, and infers which attributes are domain-specific from data by constraining information flow through the network using translation honesty losses and a penalty on the capacity of domain-specific embedding. We show that the proposed method achieves consistently high manipulation accuracy across two synthetic and one natural dataset spanning a wide variety of domain-specific and shared attributes.
翻译:未经监督的图像到图像翻译方法旨在将图像从一个域映射成另一个域的貌似实例,同时保存两个域共有的结构。在多个至多个域的设置中,目标域的另一个指导示例用于确定生成图像的域特性。在没有属性说明的情况下,方法必须从培训期间的数据中推断出每个域的具体要素。许多最先进的方法将理想的共享至图像转换成其结构,严重限制了问题的范围。在本文中,我们提出了一种新的方法,不依赖这种诱导性建筑偏见,并推断哪些属性通过使用翻译诚实损失和对特定域嵌入能力的处罚限制信息通过网络流动,从而限制信息流,从而从数据中得出特定域特性。我们表明,拟议方法在两个合成和一个自然数据集中实现一贯高的操作精度,它们跨越了广泛的特定域和共享属性。