Automatic image colorization is a particularly challenging problem. Due to the high illness of the problem and multi-modal uncertainty, directly training a deep neural network usually leads to incorrect semantic colors and low color richness. Existing transformer-based methods can deliver better results but highly depend on hand-crafted dataset-level empirical distribution priors. In this work, we propose DDColor, a new end-to-end method with dual decoders, for image colorization. More specifically, we design a multi-scale image decoder and a transformer-based color decoder. The former manages to restore the spatial resolution of the image, while the latter establishes the correlation between semantic representations and color queries via cross-attention. The two decoders incorporate to learn semantic-aware color embedding by leveraging the multi-scale visual features. With the help of these two decoders, our method succeeds in producing semantically consistent and visually plausible colorization results without any additional priors. In addition, a simple but effective colorfulness loss is introduced to further improve the color richness of generated results. Our extensive experiments demonstrate that the proposed DDColor achieves significantly superior performance to existing state-of-the-art works both quantitatively and qualitatively. Codes will be made publicly available.
翻译:自动图像颜色化是一个特别具有挑战性的问题。 由于问题和多模式不确定性的高度疾病, 直接培训深神经网络通常会导致不正确的语义颜色和低色丰富度。 基于变压器的现有方法可以产生更好的结果, 但高度依赖手工制作的数据数据集水平的经验分配前缀。 在此工作中, 我们提议DDColor, 这是一种带有双分解器的新端对端方法, 用于图像颜色化。 更具体地说, 我们设计了一个多比例图像解码器和一个基于变异器的颜色解调器。 前者设法恢复图像的空间解析, 而后者则通过交叉注意确定语义表达和颜色查询之间的相互关系。 两部解调器通过利用多尺度的视觉特征嵌入学习语义识别颜色。 在这两部解调器的帮助下, 我们的方法成功地生成了语义一致和视觉直观的颜色化结果。 此外, 一个简单但有效的彩色损失被引入来进一步改进生成结果的色彩丰富度, 而后者通过交叉的跨层实验, 将显著地展示现有的质量标准。