Image-to-image translation aims at translating a particular style of an image to another. The synthesized images can be more photo-realistic and identity-preserving by decomposing the image into content and style in a disentangled manner. While existing models focus on designing specialized network architecture to separate the two components, this paper investigates how to explicitly constrain the content and style statistics of images. We achieve this goal by transforming the input image into high frequency and low frequency information, which correspond to the content and style, respectively. We regulate the frequency distribution from two aspects: a) a spatial level restriction to locally restrict the frequency distribution of images; b) a spectral level regulation to enhance the global consistency among images. On multiple datasets we show that the proposed approach consistently leads to significant improvements on top of various state-of-the-art image translation models.
翻译:图像到图像翻译的目的是将图像的某个特定样式转换为另一个。 合成图像可以通过分解方式将图像分解成内容和风格,从而更具照片现实性和身份保护性。 虽然现有模型侧重于设计专门网络架构,将两个组成部分分开,但本文调查了如何明确限制图像的内容和风格统计。 我们通过将输入图像转换成与内容和风格相对应的高频和低频信息来实现这一目标。 我们从两个方面管理频率分布:(a) 限制图像频率分布的空间层面限制;(b) 提高图像全球一致性的光谱层面监管。 在多个数据集上,我们显示,拟议方法在各种最新图像翻译模型上不断导致显著改进。