Generative diffusion models have emerged as leading models in speech and image generation. However, in order to perform well with a small number of denoising steps, a costly tuning of the set of noise parameters is needed. In this work, we present a simple and versatile learning scheme that can step-by-step adjust those noise parameters, for any given number of steps, while the previous work needs to retune for each number separately. Furthermore, without modifying the weights of the diffusion model, we are able to significantly improve the synthesis results, for a small number of steps. Our approach comes at a negligible computation cost.
翻译:生成的传播模型已成为语音和图像生成的主要模型,然而,为了在少数一些分解步骤下顺利地发挥作用,需要花费大量时间调整一套噪音参数。在这项工作中,我们提出了一个简单多才多艺的学习计划,可以逐步调整这些噪音参数,对任何特定步骤而言,而以前的工作需要分别调整每个数字的顺序。此外,在不改变扩散模型的权重的情况下,我们可以大大改进合成结果,只采取少量步骤。我们的方法是计算成本微不足道。