We present Unified Contrastive Arbitrary Style Transfer (UCAST), a novel style representation learning and transfer framework, which can fit in most existing arbitrary image style transfer models, e.g., CNN-based, ViT-based, and flow-based methods. As the key component in image style transfer tasks, a suitable style representation is essential to achieve satisfactory results. Existing approaches based on deep neural network typically use second-order statistics to generate the output. However, these hand-crafted features computed from a single image cannot leverage style information sufficiently, which leads to artifacts such as local distortions and style inconsistency. To address these issues, we propose to learn style representation directly from a large amount of images based on contrastive learning, by taking the relationships between specific styles and the holistic style distribution into account. Specifically, we present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature. Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer. We carry out qualitative and quantitative evaluations to show that our approach produces superior results than those obtained via state-of-the-art methods.
翻译:本文提出了一种称为统一自适应对比任意风格迁移(UCAST)的新型风格表示学习和迁移框架,它可以适配大多数现有的任意图像风格迁移模型,如基于CNN、ViT和流的方法。在图像风格迁移任务中,一个适当的风格表示对于实现令人满意的结果至关重要。现有的基于深度神经网络的方法通常使用二阶统计量生成输出。然而,这些从单个图像计算出的手工制作的特征无法充分利用风格信息,这导致本地扭曲和风格不一致等伪影。为了解决这些问题,我们提出了一种直接从大量图像中学习风格表示的对比学习方法,考虑特定风格与整体风格分布之间的关系。具体来说,我们通过引入一个输入依赖温度,提出了一种适应性对比学习方案,用于风格转移。我们的框架由三个关键组件组成,即风格表示和风格转移的并行对比学习方案、用于有效学习风格分布的域增强模块和用于风格转移的生成网络。我们进行了定性和定量评估,结果表明,我们的方法比现有的最先进方法产生了更优秀的结果。