Diffusion models have emerged as a leading technique for generating images due to their ability to create high-resolution and realistic images. Despite their strong performance, diffusion models still struggle in managing image collections with significant feature differences. They often fail to capture complex features and produce conflicting results. Research has attempted to address this issue by learning different regions of an image through multiple diffusion paths and then combining them. However, this approach leads to inefficient coordination among multiple paths and high computational costs. To tackle these issues, this paper presents a Diffusion Fuzzy System (DFS), a latent-space multi-path diffusion model guided by fuzzy rules. DFS offers several advantages. First, unlike traditional multi-path diffusion methods, DFS uses multiple diffusion paths, each dedicated to learning a specific class of image features. By assigning each path to a different feature type, DFS overcomes the limitations of multi-path models in capturing heterogeneous image features. Second, DFS employs rule-chain-based reasoning to dynamically steer the diffusion process and enable efficient coordination among multiple paths. Finally, DFS introduces a fuzzy membership-based latent-space compression mechanism to reduce the computational costs of multi-path diffusion effectively. We tested our method on three public datasets: LSUN Bedroom, LSUN Church, and MS COCO. The results show that DFS achieves more stable training and faster convergence than existing single-path and multi-path diffusion models. Additionally, DFS surpasses baseline models in both image quality and alignment between text and images, and also shows improved accuracy when comparing generated images to target references.
翻译:扩散模型因其能够生成高分辨率且逼真的图像,已成为图像生成领域的主流技术。尽管性能优异,扩散模型在处理具有显著特征差异的图像集合时仍面临挑战,往往难以捕捉复杂特征并产生矛盾结果。已有研究尝试通过多个扩散路径学习图像的不同区域,再将其融合以解决此问题,但该方法导致多路径间协调效率低下且计算成本高昂。为应对这些问题,本文提出扩散模糊系统(DFS),一种由模糊规则引导的潜在空间多路径扩散模型。DFS具有以下优势:首先,与传统多路径扩散方法不同,DFS采用多个扩散路径,每条路径专门学习特定类别的图像特征。通过为每条路径分配不同的特征类型,DFS克服了多路径模型在捕捉异构图像特征方面的局限。其次,DFS利用基于规则链的推理动态引导扩散过程,实现多路径间的高效协调。最后,DFS引入基于模糊隶属度的潜在空间压缩机制,有效降低多路径扩散的计算成本。我们在三个公开数据集(LSUN Bedroom、LSUN Church和MS COCO)上测试了该方法。结果表明,与现有单路径及多路径扩散模型相比,DFS实现了更稳定的训练和更快的收敛速度。此外,DFS在图像质量、图文对齐度方面均超越基线模型,且在生成图像与目标参考图像的对比中显示出更高的准确性。