Normalizing flows are a powerful class of generative models demonstrating strong performance in several speech and vision problems. In contrast to other generative models, normalizing flows are latent variable models with tractable likelihoods and allow for stable training. However, they have to be carefully designed to represent invertible functions with efficient Jacobian determinant calculation. In practice, these requirements lead to overparameterized and sophisticated architectures that are inferior to alternative feed-forward models in terms of inference time and memory consumption. In this work, we investigate whether one can distill flow-based models into more efficient alternatives. We provide a positive answer to this question by proposing a simple distillation approach and demonstrating its effectiveness on state-of-the-art conditional flow-based models for image super-resolution and speech synthesis.
翻译:与其它基因模型不同的是,正常流动是潜在的可变模型,具有可移动的可能性,并能够进行稳定的培训。然而,必须仔细设计这些模型,以便代表不可倒置的功能,并高效地计算Jacobian的决定因素。实际上,这些要求导致过度分化和复杂的结构结构,在推论时间和记忆消耗方面,这些结构比替代的进料推进模型低。在这项工作中,我们研究是否可以将流基模型蒸馏成更有效率的替代方法。我们通过提出简单的蒸馏方法并展示其对于图像超分辨率和语音合成最先进的有条件流基模型的有效性,为这一问题提供了积极的答案。