Automatic image colorization is a particularly challenging problem. Due to the high illness of the problem and multi-modal uncertainty, directly training a deep neural network usually leads to incorrect semantic colors and low color richness. Existing transformer-based methods can deliver better results but highly depend on hand-crafted dataset-level empirical distribution priors. In this work, we propose DDColor, a new end-to-end method with dual decoders, for image colorization. More specifically, we design a multi-scale image decoder and a transformer-based color decoder. The former manages to restore the spatial resolution of the image, while the latter establishes the correlation between semantic representations and color queries via cross-attention. The two decoders incorporate to learn semantic-aware color embedding by leveraging the multi-scale visual features. With the help of these two decoders, our method succeeds in producing semantically consistent and visually plausible colorization results without any additional priors. In addition, a simple but effective colorfulness loss is introduced to further improve the color richness of generated results. Our extensive experiments demonstrate that the proposed DDColor achieves significantly superior performance to existing state-of-the-art works both quantitatively and qualitatively. Codes will be made publicly available at https://github.com/piddnad/DDColor.
翻译:自动图像颜色化是一个特别具有挑战性的问题。 由于问题和多模式不确定性的高度疾病, 直接培训深神经网络通常会导致不正确的语义颜色和低色丰富度。 基于变压器的现有方法可以提供更好的结果, 但高度依赖手工制作的数据数据集级别的经验分配前缀。 在此工作中, 我们提议DDColor, 这是一种带有双分解器的新端对端方法, 用于图像颜色化。 更具体地说, 我们设计了一个多级图像解调器和基于变压器的颜色解调器。 前者设法恢复图像的空间解析, 而后者则通过交叉留意确定语义表达和颜色查询之间的相互关系。 两部解调器通过利用多尺度的视觉特性嵌入语义识别颜色。 在这两个解调器的帮助下, 我们的方法成功地生成了语义上一致和视觉可见的颜色化结果。 此外, 一个简单但有效的彩色损失被引入来进一步改进所生成结果的色彩丰富度, 而后者将大大地展示现有的数学/ 水平测试 。