Generative adversarial networks (GANs) are known for their strong abilities on capturing the underlying distribution of training instances. Since the seminal work of GAN, many variants of GAN have been proposed. However, existing GANs are almost established on the assumption that the training dataset is clean. But in many real-world applications, this may not hold, that is, the training dataset may be contaminated by a proportion of undesired instances. When training on such datasets, existing GANs will learn a mixture distribution of desired and contaminated instances, rather than the desired distribution of desired data only (target distribution). To learn the target distribution from contaminated datasets, two purified generative adversarial networks (PuriGAN) are developed, in which the discriminators are augmented with the capability to distinguish between target and contaminated instances by leveraging an extra dataset solely composed of contamination instances. We prove that under some mild conditions, the proposed PuriGANs are guaranteed to converge to the distribution of desired instances. Experimental results on several datasets demonstrate that the proposed PuriGANs are able to generate much better images from the desired distribution than comparable baselines when trained on contaminated datasets. In addition, we also demonstrate the usefulness of PuriGAN on downstream applications by applying it to the tasks of semi-supervised anomaly detection on contaminated datasets and PU-learning. Experimental results show that PuriGAN is able to deliver the best performance over comparable baselines on both tasks.
翻译:已知的产生对抗网络(GANs)是因为它们具有捕捉培训实例基本分布的强大能力。自GAN的开创性工作以来,提出了许多GAN的变种。然而,现有的GANs几乎是在假定培训数据集是干净的前提下建立的。但在许多现实应用中,这也许不能维持培训数据集,也就是说,培训数据集可能受到不理想情况的一部分的污染。在这类数据集的培训中,现有的GANs将了解所希望的和被污染情况的混合分布,而不是所希望的仅提供所需数据(目标分布)的混合分布。从受污染的数据集中了解目标分布,可以开发两个净化的基因对抗网络(PuriGAN),在开发两个净化的基因对抗网络(PuriGAN)时,通过利用仅由污染实例组成的额外数据集来区分目标和受污染情况的能力,从而增强区分目标与受污染情况的能力。我们证明,在某些温和条件下,拟议的PuriGANs将保证与所希望的事例的分布一致。若干数据集的实验结果表明,拟议的PuriGANANs能够从受污染的可比较性基准中产生更好的图像,我们所培训的Purio-andurio-hanvial sadal sadal slad slad sal slapal be lades the supduction supdustr laut the sweadd sweal be sweal be sweatweatus