深生成图像模型的几何及其应用 (The Geometry of Deep Generative Image Models and its Applications)

Generative adversarial networks (GANs) have emerged as a powerful unsupervised method to model the statistical patterns of real-world data sets, such as natural images. These networks are trained to map random inputs in their latent space to new samples representative of the learned data. However, the structure of the latent space is hard to intuit due to its high dimensionality and the non-linearity of the generator, which limits the usefulness of the models. Understanding the latent space requires a way to identify input codes for existing real-world images (inversion), and a way to identify directions with known image transformations (interpretability). Here, we use a geometric framework to address both issues simultaneously. We develop an architecture-agnostic method to compute the Riemannian metric of the image manifold created by GANs. The eigen-decomposition of the metric isolates axes that account for different levels of image variability. An empirical analysis of several pretrained GANs shows that image variation around each position is concentrated along surprisingly few major axes (the space is highly anisotropic) and the directions that create this large variation are similar at different positions in the space (the space is homogeneous). We show that many of the top eigenvectors correspond to interpretable transforms in the image space, with a substantial part of eigenspace corresponding to minor transforms which could be compressed out. This geometric understanding unifies key previous results related to GAN interpretability. We show that the use of this metric allows for more efficient optimization in the latent space (e.g. GAN inversion) and facilitates unsupervised discovery of interpretable axes. Our results illustrate that defining the geometry of the GAN image manifold can serve as a general framework for understanding GANs.

翻译：生成对抗性网络( GANs) 已成为一种强大的、不受监督的方法,用来模拟真实世界数据集( 如自然图像) 的统计模式。这些网络经过训练, 能够将潜藏空间的随机输入映射到代表所学数据的新样本中。然而, 潜伏空间的结构很难直观, 因为它的高度和不线性, 限制了模型的实用性。了解潜伏空间需要一种方法, 以识别现有真实世界图像的输入代码( 转换), 并找到已知图像变异( 解释性) 的方向。在这里, 我们使用一个深层框架来同时处理这两个问题。我们开发了一个结构- 认知性方法, 来计算GANs所创建的图像元数的里曼度测量值。测量性离子离子的断层分解会限制模型的实用性。对一些预选的 GANs 进行的经验分析显示, 每个位置的图像变异度会集中在几大主轴上( 空间为非感应变性) 。在空间变码中, 我们的直径解为空间变的大小方向是不同的空间变法, 。在空间变法中, 我们的直径变形结构中, 向中, 向中, 显示的是空间变形变形变法是不同的空间变法向, 。