Recent works on image harmonization solve the problem as a pixel-wise image translation task via large autoencoders. They have unsatisfactory performances and slow inference speeds when dealing with high-resolution images. In this work, we observe that adjusting the input arguments of basic image filters, e.g., brightness and contrast, is sufficient for humans to produce realistic images from the composite ones. Hence, we frame image harmonization as an image-level regression problem to learn the arguments of the filters that humans use for the task. We present a Harmonizer framework for image harmonization. Unlike prior methods that are based on black-box autoencoders, Harmonizer contains a neural network for filter argument prediction and several white-box filters (based on the predicted arguments) for image harmonization. We also introduce a cascade regressor and a dynamic loss strategy for Harmonizer to learn filter arguments more stably and precisely. Since our network only outputs image-level arguments and the filters we used are efficient, Harmonizer is much lighter and faster than existing methods. Comprehensive experiments demonstrate that Harmonizer surpasses existing methods notably, especially with high-resolution inputs. Finally, we apply Harmonizer to video harmonization, which achieves consistent results across frames and 56 fps at 1080P resolution. Code and models are available at: https://github.com/ZHKKKe/Harmonizer.
翻译:最近关于图像统一的工作通过大型自动查看器解决了问题。 它们处理高分辨率图像时的性能不尽人意, 且推断速度缓慢。 在这项工作中, 我们观察到, 调整基本图像过滤器的输入参数, 例如亮度和对比度, 足以让人类从合成图像中产生现实的图像。 因此, 我们将图像协调作为一个图像级回归问题, 以了解人类用于此任务的过滤器的参数。 我们为图像统一提供了一个协调器框架。 与以前基于黑盒自动查看器的方法不同, 协调器包含一个用于过滤参数预测的神经网络, 以及若干白箱过滤器过滤器( 根据预测的参数) 用于图像协调。 我们还引入了级递增器和动态丢失策略, 使协调器能够更精确和准确地学习过滤参数。 由于我们的网络仅输出图像级参数和我们使用的过滤器的参数是高效的, 协调器比现有方法要轻得多和更快。 综合实验显示, 协调器超越了现有方法, 特别是高分辨率输入器/ 。 最后, 我们应用了统一式的代码, 在高分辨率框框中, 。