In this paper, we propose a vocoder based on a pair of forward and reverse-time linear stochastic differential equations (SDE). The solutions of this SDE pair are two stochastic processes, one of which turns the distribution of wave, that we want to generate, into a simple and tractable distribution. The other is the generation procedure that turns this tractable simple signal into the target wave. The model is called It\^oWave. It\^oWave use the Wiener process as a driver to gradually subtract the excess signal from the noise signal to generate realistic corresponding meaningful audio respectively, under the conditional inputs of original mel spectrogram. The results of the experiment show that the mean opinion scores (MOS) of It\^oWave can exceed the current state-of-the-art (SOTA) methods, and reached 4.35$\pm$0.115. The generated audio samples are available online\footnotemark[2].
翻译:在本文中, 我们基于一对前向和反向线性线性随机差分方程( SDE) 提出一个vocoder 。 这个 SDE 配对的解决方案是两个随机过程, 其中之一是将我们想要生成的波的分布转换成一个简单和可移动的分布。 另一个是将这个可移动的简单信号转换成目标波的生成程序。 模型叫 It ⁇ oWave。 It ⁇ o ⁇ Wave 使用 Wiener 程序作为驱动程序, 在原始光谱光谱的有条件输入下, 逐渐从噪声信号中减去多余的信号, 以产生现实的对应有意义的音频。 实验结果显示, It ⁇ oWave 的平均意见分数( MOS) 能够超过当前最先进的( SOTA) 方法, 并达到 4. 35 $ pm0. 115。 生成的音频样本可以在线获得\ footopotmart[ 2] 。