Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $\mathcal{O}(D^3)$ to $\mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while training more quickly and surpassing the performance of functionally constrained counterparts.
翻译:雅各克决定性词的有效梯度计算是许多机器学习环境中的一个核心问题,特别是在正常流程框架中尤其如此。 因此,大多数拟议的流程模型要么局限于功能类,容易评估雅各决定因素,要么有效估算。 但是,这些限制限制了这种密度模型的性能,往往要求深度相当,才能达到预期的性能水平。 在这项工作中,我们提议“自我标准化流程”是一个灵活的框架,通过在每一层学习到的近似反差来取代梯度中昂贵的流量来培训正常流。这降低了每个层精确更新的计算复杂性,从$\mathcal{O}(D3)美元到$\mathcal{O}(D2)美元,从而允许对原本计算不可行的流动结构进行培训,同时提供高效的抽样。 我们实验性地表明,这些模型非常稳定,最优化地与精确的梯度对应方相似的数据概率值相近,同时培训得更快并超过功能受制约的对应方的对应方的性能。