Video style transfer is getting more attention in AI community for its numerous applications such as augmented reality and animation productions. Compared with traditional image style transfer, performing this task on video presents new challenges: how to effectively generate satisfactory stylized results for any specified style, and maintain temporal coherence across frames at the same time. Towards this end, we propose Multi-Channel Correction network (MCCNet), which can be trained to fuse the exemplar style features and input content features for efficient style transfer while naturally maintaining the coherence of input videos. Specifically, MCCNet works directly on the feature space of style and content domain where it learns to rearrange and fuse style features based on their similarity with content features. The outputs generated by MCC are features containing the desired style patterns which can further be decoded into images with vivid style textures. Moreover, MCCNet is also designed to explicitly align the features to input which ensures the output maintains the content structures as well as the temporal continuity. To further improve the performance of MCCNet under complex light conditions, we also introduce the illumination loss during training. Qualitative and quantitative evaluations demonstrate that MCCNet performs well in both arbitrary video and image style transfer tasks.
翻译:与传统图像风格传输相比,在视频上执行这项任务带来了新的挑战:如何有效地为任何特定样式产生令人满意的标准化结果,同时保持各框架之间的时间一致性。为此,我们提议多通道校正网络(MCCNet),这个网络可以经过培训,将示范风格特征和输入内容内容功能结合到高效风格传输中,同时自然保持输入视频的一致性。具体地说,MCCNet直接在风格和内容域域的特点上工作,学习根据与内容特性的相似性重新排列和连接样式特征。MCC产生的结果含有理想的样式模式,可以进一步以生动样式纹理将这种模式解码成图像。此外,MCCNet还设计了明确将功能与确保产出保持内容结构和时间连续性的投入组合起来。为了在复杂光线条件下进一步改善 MCCNet的性能,我们还介绍了培训过程中的污点损失。定性和定量评价表明,MCCNet在任意的图像传输风格方面表现良好。