This paper proposes a new theoretical lens to view Wasserstein generative adversarial networks (WGANs). To minimize the Wasserstein-1 distance between the true data distribution and our estimate of it, we derive a distribution-dependent ordinary differential equation (ODE) which represents the gradient flow of the Wasserstein-1 loss, and show that a forward Euler discretization of the ODE converges. This inspires a new class of generative models that naturally integrates persistent training (which we call W1-FE). When persistent training is turned off, we prove that W1-FE reduces to WGAN. When we intensify persistent training, W1-FE is shown to outperform WGAN in training experiments from low to high dimensions, in terms of both convergence speed and training results. Intriguingly, one can reap the benefits only when persistent training is carefully integrated through our ODE perspective. As demonstrated numerically, a naive inclusion of persistent training in WGAN (without relying on our ODE framework) can significantly worsen training results.
翻译:暂无翻译