The goal of image style transfer is to render an image with artistic features guided by a style reference while maintaining the original content. Due to the locality and spatial invariance in CNNs, it is difficult to extract and maintain the global information of input images. Therefore, traditional neural style transfer methods are usually biased and content leak can be observed by running several times of the style transfer process with the same reference style image. To address this critical issue, we take long-range dependencies of input images into account for unbiased style transfer by proposing a transformer-based approach, namely StyTr^2. In contrast with visual transformers for other vision tasks, our StyTr^2 contains two different transformer encoders to generate domain-specific sequences for content and style, respectively. Following the encoders, a multi-layer transformer decoder is adopted to stylize the content sequence according to the style sequence. In addition, we analyze the deficiency of existing positional encoding methods and propose the content-aware positional encoding (CAPE) which is scale-invariant and more suitable for image style transfer task. Qualitative and quantitative experiments demonstrate the effectiveness of the proposed StyTr^2 compared to state-of-the-art CNN-based and flow-based approaches.
翻译:图像样式传输的目的是在保持原始内容的同时,将艺术特色的图像以风格参考为指导,同时保持原始内容。由于CNN的定位和空间差异,很难提取和维护输入图像的全球信息。因此,传统神经风格传输方法通常有偏向性,内容泄漏可以通过运行风格传输进程的若干次来观察到。为了解决这一关键问题,我们将输入图像的长期依赖性纳入不偏向风格传输的考虑之中,方法是提出一种基于变压器的配置方法,即StyTr}2. 与其他视觉变压器相比,我们的StyTr}2 包含两种不同的变压器编码器,分别用于生成内容和风格的域别序列序列。在编码器之后,采用了多层变压器解码器,以根据样式序列对内容序列进行同步。此外,我们分析了现有定位编码方法的缺陷,并提出了基于内容觉识位置的配置编码(CAPE),该变压器与其他视觉转换器不同,更适合用于图像样式传输任务。 Qaltialticalalal-travelyal-stal 和定量实验展示了拟议的Smart-staltial-stal-stal-stal-stal-stal-stal-stal-stal-s-s-s-stalgleglegal-s-s-s