生成深度模型对图像进行评估的一种方法——通过评估高阶空间上下文的再现 (A Method for Evaluating Deep Generative Models of Images via Assessing the Reproduction of High-order Spatial Context)

from arxiv, The paper is under consideration at Pattern Recognition Letters. Early version with preliminary results was accepted for poster presentation at SPIE-MI 2022. This version on arXiv contains new and updated designs of stochastic models, their mathematical representations and the corresponding results. Data from the designed ensembles available at https://doi.org/10.7910/DVN/HHF4AF

Deep generative models (DGMs) have the potential to revolutionize diagnostic imaging. Generative adversarial networks (GANs) are one kind of DGM which are widely employed. The overarching problem with deploying GANs, and other DGMs, in any application that requires domain expertise in order to actually use the generated images is that there generally is not adequate or automatic means of assessing the domain-relevant quality of generated images. In this work, we demonstrate several objective tests of images output by two popular GAN architectures. We designed several stochastic context models (SCMs) of distinct image features that can be recovered after generation by a trained GAN. Several of these features are high-order, algorithmic pixel-arrangement rules which are not readily expressed in covariance matrices. We designed and validated statistical classifiers to detect specific effects of the known arrangement rules. We then tested the rates at which two different GANs correctly reproduced the feature context under a variety of training scenarios, and degrees of feature-class similarity. We found that ensembles of generated images can appear largely accurate visually, and show high accuracy in ensemble measures, while not exhibiting the known spatial arrangements. Furthermore, GANs trained on a spectrum of distinct spatial orders did not respect the given prevalence of those orders in the training data. The main conclusion is that SCMs can be engineered to quantify numerous errors, per image, that may not be captured in ensemble statistics but plausibly can affect subsequent use of the GAN-generated images.

翻译：深度生成模型（Deep Generative Models，DGMs）有潜力彻底改变诊断成像。生成对抗网络（Generative Adversarial Networks，GANs）是广泛使用的一种DGM。部署GANs和其他DGMs的普遍问题是通常缺乏足够的或自动的手段来评估生成图像的领域相关质量，而这需要领域专业知识才能使用。在这项工作中，我们演示了针对两种流行的GAN体系结构输出的图像的几个客观测试。我们设计了几个不同图像特征的随机上下文模型（Stochastic Context Models，SCMs），这些特征可以通过训练有素的GAN进行生成并恢复。其中几个特征是高阶的、算法的像素排列规则，这些规则不容易用协方差矩阵表达。我们设计并验证了一些统计分类器，以检测已知排列规则的特定效应。然后我们测试了两种不同GAN在各种训练方案和特征类相似度的情况下正确重现特征上下文的速度。我们发现，生成图像的整体外观可能是准确的，并且在整体度量方面表现出高的准确性，但并不表现出已知的空间布局。此外，训练在不同空间顺序光谱上的GAN并没有尊重训练数据中给定顺序的盛行规律。主要结论是可以设计SCMs来量化许多可能不在群体统计中捕获的每个图像的错误，但可能会影响后续使用GAN生成的图像。