流动正常化的深度和条件的代表性方面 (Representational aspects of depth and conditioning in normalizing flows)

Normalizing flows are among the most popular paradigms in generative modeling, especially for images, primarily because we can efficiently evaluate the likelihood of a data point. This is desirable both for evaluating the fit of a model, and for ease of training, as maximizing the likelihood can be done by gradient descent. However, training normalizing flows comes with difficulties as well: models which produce good samples typically need to be extremely deep -- which comes with accompanying vanishing/exploding gradient problems. A very related problem is that they are often poorly conditioned: since they are parametrized as invertible maps from $\mathbb{R}^d \to \mathbb{R}^d$, and typical training data like images intuitively is lower-dimensional, the learned maps often have Jacobians that are close to being singular. In our paper, we tackle representational aspects around depth and conditioning of normalizing flows: both for general invertible architectures, and for a particular common architecture, affine couplings. We prove that $\Theta(1)$ affine coupling layers suffice to exactly represent a permutation or $1 \times 1$ convolution, as used in GLOW, showing that representationally the choice of partition is not a bottleneck for depth. We also show that shallow affine coupling networks are universal approximators in Wasserstein distance if ill-conditioning is allowed, and experimentally investigate related phenomena involving padding. Finally, we show a depth lower bound for general flow architectures with few neurons per layer and bounded Lipschitz constant.

翻译：正常化流是基因模型中最受欢迎的范例之一,对于图像来说尤其如此,这主要是因为我们可以有效地评估数据点的可能性。这对于评估模型的适合性是可取的,对于培训的便利性也是可取的,因为通过梯度下降可以使可能性最大化。然而,培训正常化流也存在困难:产生良好样品的模型通常需要非常深 -- -- 产生消失/爆炸的梯度问题。一个非常相关的问题是,这些模型往往条件很差:由于它们被假化为来自 $\mathbb{R ⁇ d\\\ to\ mathbrb{R ⁇ d$,以及典型的培训数据,例如图像直观下降的可能性。然而,学习过的地图往往有接近于奇异的雅各布人。在我们的论文中,我们处理关于深度和正常化流的描述方面:对于一般的不可逆结构,对于特定的普通结构来说,以及对于固定的固定的固定的固定曲线层,我们证明我们接近的政变层足以准确地代表一个不固定的深度的轨道结构。