Data unlearning aims to remove the influence of specific training samples from a trained model without requiring full retraining. Unlike concept unlearning, data unlearning in diffusion models remains underexplored and often suffers from quality degradation or incomplete forgetting. To address this, we first observe that most existing methods attempt to unlearn the samples at all diffusion time steps equally, leading to poor-quality generation. We argue that forgetting occurs disproportionately across time and frequency, depending on the model and scenarios. By selectively focusing on specific time-frequency ranges during training, we achieve samples with higher aesthetic quality and lower noise. We validate this improvement by applying our time-frequency selective approach to diverse settings, including gradient-based and preference optimization objectives, as well as both image-level and text-to-image tasks. Finally, to evaluate both deletion and quality of unlearned data samples, we propose a simple normalized version of SSCD. Together, our analysis and methods establish a clearer understanding of the unique challenges in data unlearning for diffusion models, providing practical strategies to improve both evaluation and unlearning performance.
翻译:数据遗忘旨在从已训练模型中移除特定训练样本的影响,而无需完全重新训练。与概念遗忘不同,扩散模型中的数据遗忘研究尚不充分,常面临生成质量下降或遗忘不完全的问题。为解决此问题,我们首先观察到现有方法大多试图在所有扩散时间步上均等地遗忘样本,导致生成质量不佳。我们认为遗忘过程在时间和频率上具有非均匀性,其分布取决于模型与具体场景。通过在训练过程中有选择地聚焦于特定时频范围,我们实现了更高美学质量与更低噪声的样本生成。我们将此时频选择方法应用于多种设置(包括基于梯度的优化目标与偏好优化目标,以及图像级任务与文生图任务),验证了其有效性。最后,为综合评估遗忘数据的删除效果与生成质量,我们提出了一种简化的归一化SSCD评估指标。我们的分析与研究方法共同深化了对扩散模型数据遗忘特有挑战的理解,并为提升评估性能与遗忘效果提供了实用策略。