We present DialogPaint, an innovative framework that employs an interactive conversational approach for image editing. The framework comprises a pretrained dialogue model (Blenderbot) and a diffusion model (Stable Diffusion). The dialogue model engages in conversation with users to understand their requirements and generates concise instructions based on the dialogue. Subsequently, the Stable Diffusion model employs these instructions, along with the input image, to produce the desired output. Due to the difficulty of acquiring fine-tuning data for such models, we leverage multiple large-scale models to generate simulated dialogues and corresponding image pairs. After fine-tuning our framework with the synthesized data, we evaluate its performance in real application scenes. The results demonstrate that DialogPaint excels in both objective and subjective evaluation metrics effectively handling ambiguous instructions and performing tasks such as object replacement, style transfer, color modification. Moreover, our framework supports multi-round editing, allowing for the completion of complicated editing tasks.
翻译:我们提出了DialogPaint,这是一种创新性的框架,采用交互式的对话方法进行图像编辑。该框架包括一个预训练的对话模型(Blenderbot)和一个扩散模型(稳定扩散)。对话模型与用户进行交流,了解他们的需求,并基于对话生成简明的指令。随后,稳定扩散模型利用这些指令以及输入图像生成所需的输出。由于获取此类模型的微调数据的困难性,我们利用多个大规模模型生成模拟对话和相应的图像对。在使用合成数据微调了我们的框架后,我们在实际应用场景中评估了其性能。结果表明,DialogPaint在客观和主观评估指标方面表现出色,有效地处理模糊的指令并执行对象替换、风格转移、颜色修改等任务。此外,我们的框架支持多轮编辑,允许完成复杂的编辑任务。