A persistent challenge in conditional image synthesis has been to generate diverse output images from the same input image despite only one output image being observed per input image. GAN-based methods are prone to mode collapse, which leads to low diversity. To get around this, we leverage Implicit Maximum Likelihood Estimation (IMLE) which can overcome mode collapse fundamentally. IMLE uses the same generator as GANs but trains it with a different, non-adversarial objective which ensures each observed image has a generated sample nearby. Unfortunately, to generate high-fidelity images, prior IMLE-based methods require a large number of samples, which is expensive. In this paper, we propose a new method to get around this limitation, which we dub Conditional Hierarchical IMLE (CHIMLE), which can generate high-fidelity images without requiring many samples. We show CHIMLE significantly outperforms the prior best IMLE, GAN and diffusion-based methods in terms of image fidelity and mode coverage across four tasks, namely night-to-day, 16x single image super-resolution, image colourization and image decompression. Quantitatively, our method improves Fr\'echet Inception Distance (FID) by 36.9% on average compared to the prior best IMLE-based method, and by 27.5% on average compared to the best non-IMLE-based general-purpose methods.
翻译:尽管每个输入图像只观察到一个输出图像,但从同一输入图像中生成不同输出图像一直是一个长期挑战。 GAN 方法容易导致模式崩溃,导致多样性程度低。 要绕过这个障碍,我们利用隐含最大隐性隐性模拟(IMLE),可以从根本上克服模式崩溃。IMLE使用与GAN相同的生成器,但用不同的非对抗性目标对它进行培训,以确保每个观察到的图像都产生样本。不幸的是,为生成高忠实图像,以前基于IMLE 的方法需要大量样本,这非常昂贵。在本文件中,我们提出了一种克服这一限制的新方法,即我们调试了保守性高度隐性惯性IMLE(ICME),它可以在不需要许多样本的情况下生成高虚度图像。我们展示CHIMLE大大超越了先前的最佳IME、GAN和基于传播方法,在四种任务中,即每天、16x单一图像超级解析度、彩色化和磁性图像平均解析方法上的最佳图像。