Fast arbitrary neural style transfer has attracted widespread attention from academic, industrial and art communities due to its flexibility in enabling various applications. Existing solutions either attentively fuse deep style feature into deep content feature without considering feature distributions, or adaptively normalize deep content feature according to the style such that their global statistics are matched. Although effective, leaving shallow feature unexplored and without locally considering feature statistics, they are prone to unnatural output with unpleasing local distortions. To alleviate this problem, in this paper, we propose a novel attention and normalization module, named Adaptive Attention Normalization (AdaAttN), to adaptively perform attentive normalization on per-point basis. Specifically, spatial attention score is learnt from both shallow and deep features of content and style images. Then per-point weighted statistics are calculated by regarding a style feature point as a distribution of attention-weighted output of all style feature points. Finally, the content feature is normalized so that they demonstrate the same local feature statistics as the calculated per-point weighted style feature statistics. Besides, a novel local feature loss is derived based on AdaAttN to enhance local visual quality. We also extend AdaAttN to be ready for video style transfer with slight modifications. Experiments demonstrate that our method achieves state-of-the-art arbitrary image/video style transfer. Codes and models are available.
翻译:现有解决方案要么在不考虑特征分布的情况下,将深风格特征与深内容特征紧密融合到深内容特征中,或者根据符合全球统计的风格,根据适应性地使深内容特征标准化。尽管这些解决方案是有效的,但使浅特征未探索,不在当地考虑特征统计,它们容易产生非自然产出,不引起当地扭曲现象。为了缓解这一问题,我们在本文件中提议了一个名为适应性关注正常化(AdaAttN)的新的关注和正常化模块,以适应性地在每点的基础上实现注意正常化。具体地说,从内容和风格图像的浅度和深度特征中学习空间关注分数。然后,每点加权统计数据的计算方法是将风格特征点作为所有风格特征点的受关注加权输出的分布。最后,内容特征的标准化是为了显示与计算出的每点加权风格特征特征特征统计相同的本地特征统计。此外,基于AdaAttN的新型本地特征损失是用来提高本地视觉质量的。我们还将Ada-Atreal Stylexyal 格式的可随意转换方法扩展。