Unsupervised deep learning has recently demonstrated the promise of producing high-quality samples. While it has tremendous potential to promote the image colorization task, the performance is limited owing to the high-dimension of data manifold and model capability. This study presents a novel scheme that exploits the score-based generative model in wavelet domain to address the issues. By taking advantage of the multi-scale and multi-channel representation via wavelet transform, the proposed model learns the richer priors from stacked coarse and detailed wavelet coefficient components jointly and effectively. This strategy also reduces the dimension of the original manifold and alleviates the curse of dimensionality, which is beneficial for estimation and sampling. Moreover, dual consistency terms in the wavelet domain, namely data-consistency and structure-consistency are devised to leverage colorization task better. Specifically, in the training phase, a set of multi-channel tensors consisting of wavelet coefficients is used as the input to train the network with denoising score matching. In the inference phase, samples are iteratively generated via annealed Langevin dynamics with data and structure consistencies. Experiments demonstrated remarkable improvements of the proposed method on both generation and colorization quality, particularly in colorization robustness and diversity.
翻译:未经监督的深层学习最近显示了生产高质量样本的希望。尽管它具有促进图像色彩化任务的巨大潜力,但由于数据多元和模型能力高度分散,其性能有限。本研究提出了一个新办法,利用波盘域中基于分数的基因变异模型来解决这些问题。具体地说,在培训阶段,利用由波盘变换的多尺度和多通道代表制,拟议模型从堆叠粗糙和详细的波盘系数组成部分中学习了较丰富的前科。在推断阶段,样本还减少原始元体的尺寸,减轻了对估计和取样有利的维度的诅咒。此外,波盘域的双重一致性术语,即数据一致性和结构一致性,是为了更好地利用彩色化任务。具体地说,在培训阶段,一套由波盘系数组成的多通道变压器被用于对网络进行调分比对等培训。在推论阶段,样品是通过无线的兰格和色彩化动力化的诅咒而反复生成的,这有利于估计和取样的颜色和结构的混合性。在生成方法上展示了显著的改进。