Generative models have achieved remarkable success across a range of applications, yet their evaluation still lacks principled uncertainty quantification. In this paper, we develop a method for comparing how close different generative models are to the underlying distribution of test samples. Particularly, our approach employs the Kullback-Leibler (KL) divergence to measure the distance between a generative model and the unknown test distribution, as KL requires no tuning parameters such as the kernels used by RKHS-based distances, and is the only $f$-divergence that admits a crucial cancellation to enable the uncertainty quantification. Furthermore, we extend our method to comparing conditional generative models and leverage Edgeworth expansions to address limited-data settings. On simulated datasets with known ground truth, we show that our approach realizes effective coverage rates, and has higher power compared to kernel-based methods. When applied to generative models on image and text datasets, our procedure yields conclusions consistent with benchmark metrics but with statistical confidence.
翻译:生成模型在一系列应用中取得了显著成功,但其评估仍缺乏系统化的不确定性量化。本文提出了一种方法,用于比较不同生成模型与测试样本底层分布的接近程度。具体而言,我们的方法采用Kullback-Leibler (KL) 散度来度量生成模型与未知测试分布之间的距离,因为KL散度无需调整参数(例如基于RKHS的距离所使用的核函数),并且是唯一允许关键抵消项以实现不确定性量化的$f$-散度。此外,我们将方法扩展至条件生成模型的比较,并利用Edgeworth展开来处理有限数据场景。在已知真实分布的模拟数据集上,我们证明该方法实现了有效的覆盖概率,且相比基于核的方法具有更高的检验功效。当应用于图像和文本数据集的生成模型时,我们的流程得出了与基准度量一致的结论,但提供了统计置信度。