In this paper, we aim to devise a universally versatile style transfer method capable of performing artistic, photo-realistic, and video style transfer jointly, without seeing videos during training. Previous single-frame methods assume a strong constraint on the whole image to maintain temporal consistency, which could be violated in many cases. Instead, we make a mild and reasonable assumption that global inconsistency is dominated by local inconsistencies and devise a generic Contrastive Coherence Preserving Loss (CCPL) applied to local patches. CCPL can preserve the coherence of the content source during style transfer without degrading stylization. Moreover, it owns a neighbor-regulating mechanism, resulting in a vast reduction of local distortions and considerable visual quality improvement. Aside from its superior performance on versatile style transfer, it can be easily extended to other tasks, such as image-to-image translation. Besides, to better fuse content and style features, we propose Simple Covariance Transformation (SCT) to effectively align second-order statistics of the content feature with the style feature. Experiments demonstrate the effectiveness of the resulting model for versatile style transfer, when armed with CCPL.
翻译:在本文中,我们的目标是设计一种通用的多功能风格传输方法,能够联合进行艺术、摄影现实和视频风格的传输,而无需在培训期间看到视频。以前的单一框架方法对整个图像形成了强烈的制约,以保持时间一致性,在许多情况下可能会违反。相反,我们作出一个温和和和合理的假设,认为全球不一致的主导因素是当地的不一致,并设计一种通用的相矛盾一致性保护损失(CCPL),适用于本地补丁。CCPL可以在风格传输期间维护内容源的一致性,而不会降低风格化。此外,它拥有一个邻居调控机制,导致当地扭曲现象大为减少,视觉质量显著改善。除了在多功能风格传输方面的优异性外,它还可以很容易推广到其他任务,例如图像到图像翻译。此外,为了改进引信内容和风格特征,我们建议简单调控变式转换(SCT),以便有效地将内容特征的第二顺序统计数据与风格特征相匹配。实验表明由此产生的多功能风格传输模式的有效性,如果配有CCPL的话。