Normalizing flows are invertible neural networks with tractable change-of-volume terms, which allow optimization of their parameters to be efficiently performed via maximum likelihood. However, data of interest are typically assumed to live in some (often unknown) low-dimensional manifold embedded in a high-dimensional ambient space. The result is a modelling mismatch since -- by construction -- the invertibility requirement implies high-dimensional support of the learned distribution. Injective flows, mappings from low- to high-dimensional spaces, aim to fix this discrepancy by learning distributions on manifolds, but the resulting volume-change term becomes more challenging to evaluate. Current approaches either avoid computing this term entirely using various heuristics, or assume the manifold is known beforehand and therefore are not widely applicable. Instead, we propose two methods to tractably calculate the gradient of this term with respect to the parameters of the model, relying on careful use of automatic differentiation and techniques from numerical linear algebra. Both approaches perform end-to-end nonlinear manifold learning and density estimation for data projected onto this manifold. We study the trade-offs between our proposed methods, empirically verify that we outperform approaches ignoring the volume-change term by more accurately learning manifolds and the corresponding distributions on them, and show promising results on out-of-distribution detection. Our code is available at https://github.com/layer6ai-labs/rectangular-flows.
翻译:正常化的流程是不可逆的神经网络,具有可移动的体积变化条件,允许以最大的可能性有效优化其参数。然而,感兴趣的数据通常被假定为存在于高维环境空间中的一些(通常不为人知的)低维元体中。结果是一种建模不匹配,因为通过建筑,不垂直的要求意味着对所学分布的高度支持。从低维到高维空间的绘图,目的是通过在多元上学习分布来纠正这一差异,但由此产生的量变化术语更难评估。目前的方法要么避免完全用各种超常方法计算这一术语,要么假设该元是事先已知的,因此不广泛适用。相反,我们提出两种方法,可以很容易地计算该术语相对于模型参数的梯度,依靠谨慎地使用数字线性升数的自动区分和技术。两种方法都对预测的数据进行端对端至端的非线性多重学习和密度估计,但由此产生的量变化术语更难评估。我们研究我们提出的方法之间的交易,从实验性地核查我们现有滚动的方法是否超越了可忽略的司际分析期。