Many image enhancement or editing operations, such as forward and inverse tone mapping or color grading, do not have a unique solution, but instead a range of solutions, each representing a different style. Despite this, existing learning-based methods attempt to learn a unique mapping, disregarding this style. In this work, we show that information about the style can be distilled from collections of image pairs and encoded into a 2- or 3-dimensional vector. This gives us not only an efficient representation but also an interpretable latent space for editing the image style. We represent the global color mapping between a pair of images as a custom normalizing flow, conditioned on a polynomial basis of the pixel color. We show that such a network is more effective than PCA or VAE at encoding image style in low-dimensional space and lets us obtain an accuracy close to 40 dB, which is about 7-10 dB improvement over the state-of-the-art methods.
翻译:许多图像增强或编辑操作,例如前方和反声调映射或颜色分级等,没有独特的解决方案,而是代表不同风格的多种解决方案。尽管如此,现有的基于学习的方法试图学习一种独特的映射,而忽略了这种风格。在这项工作中,我们显示有关风格的信息可以从图像配对的收藏中提取,并编码成二维或三维矢量。这不仅为我们提供了一个高效的演示,而且为编辑图像样式提供了一个可解释的潜在空间。我们把一对图像之间的全球色彩映射作为定制的正常流,以像素颜色的多元基数为条件。我们显示,在低维空间的编码图像样式上,这种网络比五氯苯甲醚或VAE更有效,并让我们获得接近40 dB的精度,也就是相对于最先进的方法来说大约7-10 dB的精度改进。