We introduce ScoreFusion, a theoretically grounded method for fusing multiple pre-trained diffusion models that are assumed to generate from auxiliary populations. ScoreFusion is particularly useful for enhancing the generative modeling of a target population with limited observed data. Our starting point considers the family of KL barycenters of the auxiliary populations, which is proven to be an optimal parametric class in the KL sense, but difficult to learn. Nevertheless, by recasting the learning problem as score matching in denoising diffusion, we obtain a tractable way of computing the optimal KL barycenter weights. We prove a dimension-free sample complexity bound in total variation distance, provided that the auxiliary models are well fitted for their own task and the auxiliary tasks combined capture the target well. We also explain a connection of the practice of checkpoint merging in AI art creation to an approximation of our KL-barycenter-based fusion approach. However, our fusion method differs in key aspects, allowing generation of new populations, as we illustrate in experiments.
翻译:暂无翻译