Most existing face image Super-Resolution (SR) methods assume that the Low-Resolution (LR) images were artificially downsampled from High-Resolution (HR) images with bicubic interpolation. This operation changes the natural image characteristics and reduces noise. Hence, SR methods trained on such data most often fail to produce good results when applied to real LR images. To solve this problem, we propose a novel framework for generation of realistic LR/HR training pairs. Our framework estimates realistic blur kernels, noise distributions, and JPEG compression artifacts to generate LR images with similar image characteristics as the ones in the source domain. This allows us to train a SR model using high quality face images as Ground-Truth (GT). For better perceptual quality we use a Generative Adversarial Network (GAN) based SR model where we have exchanged the commonly used VGG-loss [24] with LPIPS-loss [52]. Experimental results on both real and artificially corrupted face images show that our method results in more detailed reconstructions with less noise compared to existing State-of-the-Art (SoTA) methods. In addition, we show that the traditional non-reference Image Quality Assessment (IQA) methods fail to capture this improvement and demonstrate that the more recent NIMA metric [16] correlates better with human perception via Mean Opinion Rank (MOR).
翻译:大部分现有的脸部图像超分辨率(SR)方法假定,低分辨率(LR)图像被人工从高分辨率(HR)图像中以双立内插图解(双立内插)图像中提取出来。此操作可以改变自然图像特征并减少噪音。 因此,在这些数据上培训的SR方法在应用真实的LR图像时往往无法产生良好结果。 为了解决这个问题,我们提出了一个创造现实的LR/HR培训配对的新框架。 我们的框架估计现实的模糊内核、噪音分布和JPEG压缩产品以生成与源域内图像相类似的LR图像。 这使我们能够用高品质的图像作为地面图(GT)来培训SR模型。 为了更好的感知性质量,我们使用基于General adversarial网络(GAN) 的SR模型来交换常用的VGGG-损失[24] 和LPIPS损失 [52] 。 真实和人为腐蚀的面图像实验性结果显示,我们的方法是更细致的重建结果,而比现有的国标度(A-RIA) 更低的模型展示了我们通过常规图像(因此更低的比我们更低的模型展示了我们更低的图像) 的方法。