Image colorization is a well-known problem in computer vision. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though several attempts have been made by researchers to make the colorization pipeline automatic, these processes often produce unrealistic results due to a lack of conditioning. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of the colorization process. To the best of our knowledge, this is one of the first attempts to incorporate textual conditioning in the colorization pipeline. To do so, we have proposed a novel deep network that takes two inputs (the grayscale image and the respective encoded text description) and tries to predict the relevant color gamut. As the respective textual descriptions contain color information of the objects present in the scene, the text encoding helps to improve the overall quality of the predicted colors. We have evaluated our proposed model using different metrics and found that it outperforms the state-of-the-art colorization algorithms both qualitatively and quantitatively.
翻译:图像颜色化是计算机视觉中众所周知的一个问题。 然而,由于任务的性质不正确,图像颜色化具有内在的挑战性。 尽管研究人员曾几次尝试使彩色化管道自动化, 但由于缺乏调节, 这些过程往往会产生不现实的结果。 在这项工作中, 我们试图将文字描述与要颜色化的灰度图像合并为辅助条件, 以提高色彩化进程的忠诚性。 根据我们的知识, 这是首次尝试在彩色化管道中引入文字调节。 为了做到这一点, 我们提议了一个全新的深度网络, 它将接收两种输入( 灰度图像和相应的编码文本描述) 并试图预测相关的颜色组合。 由于相应的文字描述包含现场所显示的物体的颜色信息, 文本编码有助于提高预测颜色的总体质量。 我们用不同的测量尺度评估了我们提议的模型, 并发现它超过了质量和数量上的最新色彩化算法 。