In this paper, we show that the approximation for distributions by Wasserstein GAN depends on both the width/depth (capacity) of generators and discriminators, as well as the number of samples in training. A quantified generalization bound is developed for Wasserstein distance between the generated distribution and the target distribution. It implies that with sufficient training samples, for generators and discriminators with proper number of width and depth, the learned Wasserstein GAN can approximate distributions well. We discover that discriminators suffer a lot from the curse of dimensionality, meaning that GANs have higher requirement for the capacity of discriminators than generators, which is consistent with the theory in arXiv:1703.00573v5 [cs.LG]. More importantly, overly deep (high capacity) generators may cause worse results (after training) than low capacity generators if discriminators are not strong enough. Different from Wasserstein GAN in arXiv:1701.07875v3 [stat.ML], we adopt GroupSort neural networks arXiv:1811.05381v2 [cs.LG] in the model for their better approximation to 1-Lipschitz functions. Compared to some existing generalization (convergence) analysis of GANs, we expect our work are more applicable.
翻译:在本文中,我们表明,瓦西尔斯坦GAN的分布近似值取决于发电机和导体的宽度/深度(能力)以及培训中的样本数量。为瓦西尔斯坦的分布与目标分布之间的距离制定了量化的通用约束。这意味着,如果有足够的培训样本,对于具有适当宽度和深度的发电机和导体,学得的瓦西尔斯坦GAN可以很好地接近分布。我们发现,歧视者在维度的诅咒下深受歧视之害,这意味着GAN对歧视者的能力的要求高于发电机的能力,这与ArXiv:1703.00573v5 [cs.LG]中的理论是一致的。更重要的是,如果歧视者不够强大,过深(能力)的发电机可能会比低能力生成者产生更差的结果(经过培训后)。不同于Arxiv:1701.07875v3[stat.ML]中的WasserrXiv:1811.05381[csverLlp]现有神经网络对改进了我们的DNA工作模型的预期。