Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring intense user interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image level (context) features. However, obtaining human hints is not always feasible and CNNs alone are not able to learn object-level semantics unless multiple models pretrained with supervision are considered. In this work, we propose a single network, named UCapsNet, that separate image-level features obtained through convolutions and object-level features captured by means of capsules. Then, by skip connections over different layers, we enforce collaboration between such disentangling factors to produce high quality and plausible image colourisation. We pose the problem as a classification task that can be addressed by a fully self-supervised approach, thus requires no human effort. Experimental results on three benchmark datasets show that our approach outperforms existing methods on standard quality metrics and achieves a state of the art performances on image colourisation. A large scale user study shows that our method is preferred over existing solutions.
翻译:图像颜色化是一个错误的问题, 多重正确的解决方案取决于输入数据中存在的上下文和对象实例。 以前的方法通过要求用户进行密集互动或利用神经神经网络(CNNs)在学习图像水平( context)特征方面的能力来应对问题。 然而, 获取人类提示并不总是可行的, 光有CNN无法单独学习目标层次的语义, 除非考虑多个经过监督培训的模型。 在这项工作中, 我们提出一个名为 UCapsNet 的单一网络, 它将通过聚合和通过胶囊捕捉到的物体级别特征获得不同的图像级别特征。 然后, 通过跳过不同层的连接, 我们强制在这种分解因素之间开展协作, 以产生高质量的和可信的图像颜色化。 我们作为一个分类任务提出了问题, 可以通过完全自我监督的方法来解决, 因此不需要人的努力。 三个基准数据集的实验结果显示, 我们的方法在标准质量指标上优于现有的方法, 并且实现了艺术表现在图像颜色化方面的状态。 大比例用户研究表明, 我们的方法是现有的方法。