Image rescaling aims to learn the optimal low-resolution (LR) image that can be accurately reconstructed to its original high-resolution (HR) counterpart, providing an efficient image processing and storage method for ultra-high definition media. However, extreme downscaling factors pose significant challenges to the upscaling process due to its highly ill-posed nature, causing existing image rescaling methods to struggle in generating semantically correct structures and perceptual friendly textures. In this work, we propose a novel framework called One-Step Image Rescaling Diffusion (OSIRDiff) for extreme image rescaling, which performs rescaling operations in the latent space of a pre-trained autoencoder and effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model. Specifically, OSIRDiff adopts a pseudo-invertible module to establish the bidirectional mapping between the latent features of the HR image and the target-sized LR image. Then, the rescaled features are refined by a pre-trained diffusion model to generate more faithful and visually pleasing details. The entire model is end-to-end trained to enable the diffusion priors to guide the rescaling process. Considering the spatially non-uniform reconstruction quality of the rescaled latent features, we propose a novel time-step alignment strategy, which can adaptively determine the generative strength of the diffusion model based on the degree of latent reconstruction errors. Extensive experiments demonstrate the superiority of OSIRDiff over previous methods in both quantitative and qualitative evaluations.
翻译:暂无翻译