Reference-based line-art colorization is a challenging task in computer vision. The color, texture, and shading are rendered based on an abstract sketch, which heavily relies on the precise long-range dependency modeling between the sketch and reference. Popular techniques to bridge the cross-modal information and model the long-range dependency employ the attention mechanism. However, in the context of reference-based line-art colorization, several techniques would intensify the existing training difficulty of attention, for instance, self-supervised training protocol and GAN-based losses. To understand the instability in training, we detect the gradient flow of attention and observe gradient conflict among attention branches. This phenomenon motivates us to alleviate the gradient issue by preserving the dominant gradient branch while removing the conflict ones. We propose a novel attention mechanism using this training strategy, Stop-Gradient Attention (SGA), outperforming the attention baseline by a large margin with better training stability. Compared with state-of-the-art modules in line-art colorization, our approach demonstrates significant improvements in Fr\'echet Inception Distance (FID, up to 27.21%) and structural similarity index measure (SSIM, up to 25.67%) on several benchmarks. The code of SGA is available at https://github.com/kunkun0w0/SGA .
翻译:在计算机视野中,基于参考的线-线-艺术色彩化是一项具有挑战性的任务。颜色、纹理和阴影是建立在抽象草图基础上的,它在很大程度上依赖于草图和参考之间的精确的长距离依赖模型; 连接跨模式信息和长距离依赖模型的流行技术采用了关注机制; 但是,在基于参考的线-艺术色彩化背景下,几种技术会加剧现有的培训困难,例如,自监督的培训协议和基于GAN的损失。为了了解培训的不稳定性,我们发现注意的梯度流动,并观察关注分支之间的梯度冲突。 这一现象激励我们通过在消除冲突时保留主导梯度分支来缓解梯度问题。我们建议采用新的关注机制,使用这一培训战略,即停止关注,以更好的培训稳定性大大超过关注基线。与在线颜色化中的状态-艺术模块相比,我们的方法显示Fr\'chetch Incepion Lear(FID,最高为27.21%)和结构相似度问题。在SGA/SGA第25/10号标准上,可提供若干类似性指标。