During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users tend to gradually change the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited areas. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With about $1\%$-area edits, our method reduces the computation of DDPM by $7.5\times$, Stable Diffusion by $8.2\times$, and GauGAN by $18\times$ while preserving the visual fidelity. With SIGE, we accelerate the inference time of DDPM by $3.0\times$ on NVIDIA RTX 3090 and $6.6\times$ on Apple M1 Pro CPU, Stable Diffusion by $7.2\times$ on 3090, and GauGAN by $5.6\times$ on 3090 and $14\times$ on M1 Pro CPU.
翻译:在图像编辑中,现有深度生成模型往往会从头开始重新合成整个输出,包括未编辑的区域。这导致大量计算资源浪费,特别是对于微小的编辑操作。本文提出了空间稀疏推理(SSI),一种通用技术,可以选择性地为编辑区域执行计算,并加速各种生成模型,包括有条件的GAN和扩散模型。我们的主要观察是用户往往会逐渐更改输入图像。这促使我们缓存和重复使用原始图像的特征图。给定一个编辑图像,我们针对编辑区域稀疏地应用卷积滤波器,同时重复使用未编辑区域的缓存特征。基于我们的算法,我们进一步提出了稀疏增量生成引擎(SIGE),将计算减少转化为现有硬件上的延迟减少。对于$1\%$的面积编辑,我们的方法仅仅使用DDPM模型的$7.5\times$, 稳定的扩散模型的$8.2\times$,和 GauGAN模型的$18\times$计算资源,同时保持了视觉保真度。
借助SIGE,我们在NVIDIA RTX 3090上将DDPM模型的推理时间加快了$3.0\times$,在Apple M1 Pro CPU上加快了$6.6\times$; 让稳定的扩散模型在3090上加速了$7.2\times$; 并且在3090和M1 Pro CPU上加快了GauGAN模型的推理时间分别为 $5.6\times$和$14\times$。