Image super-resolution is a one-to-many problem, but most deep-learning based methods only provide one single solution to this problem. In this work, we tackle the problem of diverse super-resolution by reusing VD-VAE, a state-of-the art variational autoencoder (VAE). We find that the hierarchical latent representation learned by VD-VAE naturally separates the image low-frequency information, encoded in the latent groups at the top of the hierarchy, from the image high-frequency details, determined by the latent groups at the bottom of the latent hierarchy. Starting from this observation, we design a super-resolution model exploiting the specific structure of VD-VAE latent space. Specifically, we train an encoder to encode low-resolution images in the subset of VD-VAE latent space encoding the low-frequency information, and we combine this encoder with VD-VAE generative model to sample diverse super-resolved version of a low-resolution input. We demonstrate the ability of our method to generate diverse solutions to the super-resolution problem on face super-resolution with upsampling factors x4, x8, and x16.
翻译:图像超级解析是一个一对多个问题, 但大多数深层次的基于深层学习的方法只能为这一问题提供一个单一的解决方案。 在这项工作中, 我们通过重新使用VD- VAE(一个最先进的艺术变异自动编码器(VAE), 来解决各种超级解析问题。 我们发现, VD- VAE 所学到的等级潜潜伏代表自然将图像低频信息与在最高层的潜伏组中编码的低频信息与由潜层底部的潜伏组所决定的图像高频细节相分离。 我们从这一观察开始, 我们设计了一个超级解析模型, 利用VD- VAE 潜伏空间的具体结构。 具体地说, 我们训练一个编码器, 将低频信息编码在VD- VAE 潜伏空间的子组中, 并且我们把这个编码器与VD- VAE 基因感化模型结合起来, 以抽样各种高解析版本的低分辨率输入。 我们展示了我们的方法能够为面超级解答x8 和X4 和X4 立系数图。