Recently, instruction-based image editing (IIE) has received widespread attention. In practice, IIE often modifies only specific regions of an image, while the remaining areas largely remain unchanged. Although these two types of regions differ significantly in generation difficulty and computational redundancy, existing IIE models do not account for this distinction, instead applying a uniform generation process across the entire image. This motivates us to propose RegionE, an adaptive, region-aware generation framework that accelerates IIE tasks without additional training. Specifically, the RegionE framework consists of three main components: 1) Adaptive Region Partition. We observed that the trajectory of unedited regions is straight, allowing for multi-step denoised predictions to be inferred in a single step. Therefore, in the early denoising stages, we partition the image into edited and unedited regions based on the difference between the final estimated result and the reference image. 2) Region-Aware Generation. After distinguishing the regions, we replace multi-step denoising with one-step prediction for unedited areas. For edited regions, the trajectory is curved, requiring local iterative denoising. To improve the efficiency and quality of local iterative generation, we propose the Region-Instruction KV Cache, which reduces computational cost while incorporating global information. 3) Adaptive Velocity Decay Cache. Observing that adjacent timesteps in edited regions exhibit strong velocity similarity, we further propose an adaptive velocity decay cache to accelerate the local denoising process. We applied RegionE to state-of-the-art IIE base models, including Step1X-Edit, FLUX.1 Kontext, and Qwen-Image-Edit. RegionE achieved acceleration factors of 2.57, 2.41, and 2.06. Evaluations by GPT-4o confirmed that semantic and perceptual fidelity were well preserved.
翻译:近年来,基于指令的图像编辑(IIE)受到了广泛关注。在实际应用中,IIE通常仅修改图像的特定区域,而其余区域基本保持不变。尽管这两类区域在生成难度和计算冗余上存在显著差异,但现有的IIE模型并未考虑这一区别,而是对整个图像采用统一的生成过程。这促使我们提出RegionE,一种无需额外训练的自适应区域感知生成框架,以加速IIE任务。具体而言,RegionE框架包含三个主要组成部分:1)自适应区域划分。我们观察到未编辑区域的轨迹呈直线,因此可在单步中推断多步去噪预测。因此,在早期去噪阶段,我们根据最终估计结果与参考图像之间的差异,将图像划分为编辑区域和未编辑区域。2)区域感知生成。在区分区域后,我们对未编辑区域用单步预测替代多步去噪;对于编辑区域,其轨迹呈曲线,需进行局部迭代去噪。为提高局部迭代生成的效率与质量,我们提出了区域-指令KV缓存,在融入全局信息的同时降低计算成本。3)自适应速度衰减缓存。通过观察发现编辑区域相邻时间步的速度具有强相似性,我们进一步提出自适应速度衰减缓存以加速局部去噪过程。我们将RegionE应用于包括Step1X-Edit、FLUX.1 Kontext和Qwen-Image-Edit在内的前沿IIE基础模型,分别实现了2.57、2.41和2.06倍的加速效果。GPT-4o评估证实,语义与感知保真度均得到良好保持。