Probabilistic generative models provide a powerful framework for representing data that avoids the expense of manual annotation typically needed by discriminative approaches. Model selection in this generative setting can be challenging, however, particularly when likelihoods are not easily accessible. To address this issue, we introduce a statistical test of relative similarity, which is used to determine which of two models generates samples that are significantly closer to a real-world reference dataset of interest. We use as our test statistic the difference in maximum mean discrepancies (MMDs) between the reference dataset and each model dataset, and derive a powerful, low-variance test based on the joint asymptotic distribution of the MMDs between each reference-model pair. In experiments on deep generative models, including the variational auto-encoder and generative moment matching network, the tests provide a meaningful ranking of model performance as a function of parameter and training settings.
翻译:概率基因模型为代表数据提供了强有力的框架,从而避免了歧视性方法通常需要人工注释的费用。但是,在这种基因模型中选择模型可能具有挑战性,特别是当可能性不易获得时。为了解决这一问题,我们引入了相对相似性的统计测试,用以确定两个模型中哪一个模型生成的样本明显接近于真实世界参考数据集。我们用参考数据集和每个模型数据集之间的最大平均值差异(MMDs)来进行统计,并根据每对参考模型之间联合的混杂式MDs分布,得出一个强大、低变量的测试。在深层基因模型实验中,包括变式自动编码和变形时配对网络,测试提供了一种有意义的模型性能排序,作为参数和培训环境的函数。