Diffusion models have achieved remarkable success in text-to-image generation, enabling the creation of high-quality images from text prompts or other modalities. However, existing methods for customizing these models are limited by handling multiple personalized subjects and the risk of overfitting. Moreover, their large number of parameters is inefficient for model storage. In this paper, we propose a novel approach to address these limitations in existing text-to-image diffusion models for personalization. Our method involves fine-tuning the singular values of the weight matrices, leading to a compact and efficient parameter space that reduces the risk of overfitting and language-drifting. We also propose a Cut-Mix-Unmix data-augmentation technique to enhance the quality of multi-subject image generation and a simple text-based image editing framework. Our proposed SVDiff method has a significantly smaller model size (1.7MB for StableDiffusion) compared to existing methods (vanilla DreamBooth 3.66GB, Custom Diffusion 73MB), making it more practical for real-world applications.
翻译:扩散模型在文本到图像生成方面取得了显着的成功,使得从文本提示或其他模态生成高质量图像成为可能。然而,现有的自定义这些模型的方法受到处理多个个性化主题和过度拟合风险的限制,此外,它们的大量参数对于模型存储是低效的。在本文中,我们提出了一种新的方法来解决现有文本到图像扩散模型的个性化限制。我们的方法涉及微调权重矩阵的奇异值,从而获得一种紧凑而有效的参数空间,降低了过度拟合和语言漂移的风险。我们还提出了一种Cut-Mix-Unmix数据增强技术,以增强多主题图像生成的质量和一个简单的基于文本的图像编辑框架。我们提出的SVDiff方法具有显著较小的模型尺寸(StableDiffusion为1.7MB),相比现有方法(vanilla DreamBooth 3.66GB,Custom Diffusion 73MB)更适用于实际应用。