We present the Colorization Transformer, a novel approach for diverse high fidelity image colorization based on self-attention. Given a grayscale image, the colorization proceeds in three steps. We first use a conditional autoregressive transformer to produce a low resolution coarse coloring of the grayscale image. Our architecture adopts conditional transformer layers to effectively condition grayscale input. Two subsequent fully parallel networks upsample the coarse colored low resolution image into a finely colored high resolution image. Sampling from the Colorization Transformer produces diverse colorings whose fidelity outperforms the previous state-of-the-art on colorising ImageNet based on FID results and based on a human evaluation in a Mechanical Turk test. Remarkably, in more than 60% of cases human evaluators prefer the highest rated among three generated colorings over the ground truth. The code and pre-trained checkpoints for Colorization Transformer are publicly available at https://github.com/google-research/google-research/tree/master/coltran
翻译:我们展示了色彩化变异器, 这是一种基于自我自留的多种高忠诚图像颜色化的新颖方法。 在灰度图像中, 色彩化分三个步骤。 我们首先使用一个有条件的自动递减变异器来生成灰度图像的低分辨率粗色彩化变异器。 我们的建筑采用有条件的变异器层来有效地调节灰度输入。 两个完全平行的网络随后将粗色低分辨率的色彩化低分辨率图像上传到一个精细色彩化的高分辨率图像中。 从色彩化变异器中取样产生多种颜色, 其真实性超过了基于FID结果和机械土耳其测试中人类评估的颜色化图像网络最新状态。 值得注意的是, 超过60%的人类评价者更喜欢在三个生成的颜色中进行高于地面真相的最高评级。 色彩变异异的代码和预先培训的检查站可在https://github.com/gole- reearch/gole- reearch/tree/mair/certran 。