We tackle the problem of feature unlearning from a pretrained image generative model. Unlike a common unlearning task where an unlearning target is a subset of the training set, we aim to unlearn a specific feature, such as hairstyle from facial images, from the pretrained generative models. As the target feature is only presented in a local region of an image, unlearning the entire image from the pretrained model may result in losing other details in the remaining region of the image. To specify which features to unlearn, we develop an implicit feedback mechanism where a user can select images containing the target feature. From the implicit feedback, we identify a latent representation corresponding to the target feature and then use the representation to unlearn the generative model. Our framework is generalizable for the two well-known families of generative models: GANs and VAEs. Through experiments on MNIST and CelebA datasets, we show that target features are successfully removed while keeping the fidelity of the original models.
翻译:我们从经过预先训练的图像基因化模型中处理特征不学习的问题。 与一项共同的不学习任务不同, 未学习的目标是训练集的一个子集, 我们的目标是从经过训练的基因化模型中解开一个具体特征, 例如脸部图像的发型, 因为目标特征只出现在图像的当地区域, 将整个图像从经过训练的模型中解开可能导致图像剩余区域的其他细节丢失。 为了具体说明哪些特征不学习, 我们开发了一个隐含的反馈机制, 用户可以在其中选择包含目标特征的图像。 我们从隐含的反馈中找到一个与目标特征相对应的潜在代表, 然后用这个代表来解开基因化模型。 我们的框架对两个著名的基因化模型组( GANs 和 VAEs ) 来说是通用的。 通过对MNIST 和 CelebA 数据集的实验, 我们显示目标特征在保持原始模型的忠实性的同时被成功删除了。</s>