We propose a method to optimize the representation and distinguishability of samples from two probability distributions, by maximizing the estimated power of a statistical test based on the maximum mean discrepancy (MMD). This optimized MMD is applied to the setting of unsupervised learning by generative adversarial networks (GAN), in which a model attempts to generate realistic samples, and a discriminator attempts to tell these apart from data samples. In this context, the MMD may be used in two roles: first, as a discriminator, either directly on the samples, or on features of the samples. Second, the MMD can be used to evaluate the performance of a generative model, by testing the model's samples against a reference data set. In the latter role, the optimized MMD is particularly helpful, as it gives an interpretable indication of how the model and data distributions differ, even in cases where individual model samples are not easily distinguished either by eye or by classifier.
翻译:我们建议了一种方法,通过最大限度地提高基于最大平均差异(MD)的统计测试的估计功率,从两种概率分布中优化样品的代表性和可辨别性。这种优化的MMD应用到由基因对抗网络(GAN)进行不受监督的学习的设置,在这种学习中,模型试图产生现实的样本,并试图区分数据样本以外的这些样本。在这方面,MMD可以发挥两种作用:第一,作为直接在样品上或样品特征上的区分者;第二,MMD可用于评估基因模型的性能,通过对照一组参考数据测试模型的样品。在后一种作用中,优化的MMD特别有用,因为它可以解释地说明模型和数据分布如何不同,即使在单个模型样本不容易被眼睛或分类者区分的情况下也是如此。