Parametric adversarial divergences, which are a generalization of the losses used to train generative adversarial networks (GANs), have often been described as being approximations of their nonparametric counterparts, such as the Jensen-Shannon divergence, which can be derived under the so-called optimal discriminator assumption. In this position paper, we argue that despite being "non-optimal", parametric divergences have distinct properties from their nonparametric counterparts which can make them more suitable for learning high-dimensional distributions. A key property is that parametric divergences are only sensitive to certain aspects/moments of the distribution, which depend on the architecture of the discriminator and the loss it was trained with. In contrast, nonparametric divergences such as the Kullback-Leibler divergence are sensitive to moments ignored by the discriminator, but they do not necessarily correlate with sample quality (Theis et al., 2016). Similarly, we show that mutual information can lead to unintuitive interpretations, and explore more intuitive alternatives based on parametric divergences. We conclude that parametric divergences are a flexible framework for defining statistical quantities relevant to a specific modeling task.
翻译:在这份立场文件中,我们争辩说,尽管参数差异是“非最佳”的,但与非参数差异的特性不同,非参数差异可以使其更适合学习高维分布。一个关键属性是参数差异只敏感于分布的某些方面/时势,而这种差异取决于区分器的结构及其所培训的损失。 相比之下,非参数差异,如Kullback-Lebeller差异,对歧视器忽略的时段十分敏感,但与抽样质量不一定相关(Theis等人,2016年)。同样,我们表明,相互信息可能导致不直截了当的解释,并探索基于参数差异的更直观的替代方法。我们的结论是,参数差异是确定与具体任务模式相关的统计数量的灵活框架。