Large pre-trained generative models are known to occasionally output undesirable samples, which undermines their trustworthiness. The common way to mitigate this is to re-train them differently from scratch using different data or different regularization -- which uses a lot of computational resources and does not always fully address the problem. In this work, we take a different, more compute-friendly approach and investigate how to post-edit a model after training so that it ''redacts'', or refrains from outputting certain kinds of samples. We show that redaction is a fundamentally different task from data deletion, and data deletion may not always lead to redaction. We then consider Generative Adversarial Networks (GANs), and provide three different algorithms for data redaction that differ on how the samples to be redacted are described. Extensive evaluations on real-world image datasets show that our algorithms out-perform data deletion baselines, and are capable of redacting data while retaining high generation quality at a fraction of the cost of full re-training.
翻译:偶尔会输出不受欢迎的样本,从而破坏它们的信任性。 减少这种现象的常见方法是使用不同的数据或不同的正规化方法,从零到零,用不同的数据或不同的正规化方法,对它们进行不同的再培训,这种方法使用大量的计算资源,而且并不总是完全解决问题。在这项工作中,我们采取不同的、更计算友好的方法,调查在培训后如何编辑一个模型,以便“编辑”或避免输出某些类型的样本。我们表明,编辑是一项与数据删除截然不同的任务,而数据删除可能并不总是导致编辑。我们然后考虑基因自动网络(GANs),为数据编辑提供三种不同的算法,以不同的方式描述要如何重编的样本。对真实世界图像数据集的广泛评估表明,我们的算法超越了数据格式删除基线,并且能够重新生成数据,同时保留高产质量,其成本只是全面再培训的一小部分。