Exemplar-based video colorization is an essential technique for applications like old movie restoration. Although recent methods perform well in still scenes or scenes with regular movement, they always lack robustness in moving scenes due to their weak ability in modeling long-term dependency both spatially and temporally, leading to color fading, color discontinuity or other artifacts. To solve this problem, we propose an exemplar-based video colorization framework with long-term spatiotemporal dependency. To enhance the long-term spatial dependency, a parallelized CNN-Transformer block and a double head non-local operation are designed. The proposed CNN-Transformer block can better incorporate long-term spatial dependency with local texture and structural features, and the double head non-local operation further leverages the performance of augmented feature. While for long-term temporal dependency enhancement, we further introduce the novel linkage subnet. The linkage subnet propagate motion information across adjacent frame blocks and help to maintain temporal continuity. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively. Also, our model can generate more colorful, realistic and stabilized results, especially for scenes where objects change greatly and irregularly.
翻译:基于榜样的视频着色是类似旧电影修复等应用中不可缺少的技术。虽然最近的方法在静止场景或有规律运动的场景中表现良好,但由于其在模拟长期时空依赖性方面的能力较弱,导致在移动场景中缺乏鲁棒性,会出现颜色褪色、颜色不连续或其他伪影。为了解决这个问题,我们提出了一种具有长期时空依赖性的基于榜样的视频着色框架。为了增强长期空间依赖性,我们设计了一个并行的CNN-Transformer块和一个双头非局部操作。所提出的CNN-Transformer块能更好地将长期空间依赖性与局部纹理和结构特征相结合,而双头非局部操作则进一步利用了增强的特征的性能。对于长期时间依赖性的增强,我们进一步引入了新型联接子网络。联接子网络可以在相邻帧块之间传播运动信息,有助于维护时间连续性。实验表明,我们的模型在定量和定性上均优于最近的最先进方法。此外,我们的模型可以生成更丰富、更真实和更稳定的结果,特别是在物体发生巨大且不规则变化的场景中。