We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. We also discuss the issue of kernel choice for the MMD critic, and characterize the kernel corresponding to the energy distance used for the Cramer GAN critic. Being an integral probability metric, the MMD benefits from training strategies recently developed for Wasserstein GANs. In experiments, the MMD GAN is able to employ a smaller critic network than the Wasserstein GAN, resulting in a simpler and faster-training algorithm with matching performance. We also propose an improved measure of GAN convergence, the Kernel Inception Distance, and show how to use it to dynamically adapt learning rates during GAN training.
翻译:作为我们的主要理论贡献,我们澄清了最近工作引发的GAN损失功能中的偏差:我们显示,MMD GANs和Wasserstein GANs在优化过程中使用的梯度估计器是不带偏见的,但根据样本学习的偏差导致发电机参数的偏差梯度。我们还讨论MMD评论家的内核选择问题,并描述与Cramer GAN评论家使用的能源距离相对应的内核。MMD从最近为Wasserstein GANs制定的培训战略中获益,作为一个整体概率指标。在实验中,MMDGAN能够使用比Wasserstein GAN更小的批评网络,从而产生一个更简单、更快的培训算法与性能匹配。我们还提出了改进GAN趋同度、Kernel Invion距离,并展示如何在GAN培训期间使用GAN的动态调整学习率。