We explore the problem of generating minority samples using diffusion models. The minority samples are instances that lie on low-density regions of a data manifold. Generating sufficient numbers of such minority instances is important, since they often contain some unique attributes of the data. However, the conventional generation process of the diffusion models mostly yields majority samples (that lie on high-density regions of the manifold) due to their high likelihoods, making themselves highly ineffective and time-consuming for the task. In this work, we present a novel framework that can make the generation process of the diffusion models focus on the minority samples. We first provide a new insight on the majority-focused nature of the diffusion models: they denoise in favor of the majority samples. The observation motivates us to introduce a metric that describes the uniqueness of a given sample. To address the inherent preference of the diffusion models w.r.t. the majority samples, we further develop minority guidance, a sampling technique that can guide the generation process toward regions with desired likelihood levels. Experiments on benchmark real datasets demonstrate that our minority guidance can greatly improve the capability of generating the low-likelihood minority samples over existing generative frameworks including the standard diffusion sampler.
翻译:我们探索利用扩散模型生成少数群体样本的问题。 少数群体样本是数据多重的低密度区域的实例。 生成足够数量的此类少数群体案例很重要, 因为它们往往包含数据的某些独特属性。 然而, 传统的传播模型生成过程, 由于其可能性很大, 多数样本( 在高密度区域 ) 产生多数样本( 在高密度区域 ), 使得自己非常无效, 并耗费大量时间来完成这项任务。 在这项工作中, 我们提出了一个新的框架, 使扩散模型的生成过程以少数群体样本为重点。 我们首先对以多数为重点的传播模型的性质提供了新的洞察力: 它们倾向于多数样本。 观测促使我们引入一个描述特定样本独特性的指标。 为了解决传播模型的固有偏好, 我们进一步开发了少数群体指南, 一种能够指导生成过程到具有理想可能性的区域的取样技术。 测试真实数据集的实验表明, 我们的少数群体指南可以极大地提高生成低类似少数群体样本的能力, 而不是现有的基因样本样本, 包括标准扩散框架。