We propose a unified approach to data-driven source-filter modeling using a single neural network for developing a neural vocoder capable of generating high-quality synthetic speech waveforms while retaining flexibility of the source-filter model to control their voice characteristics. Our proposed network called unified source-filter generative adversarial networks (uSFGAN) is developed by factorizing quasi-periodic parallel WaveGAN (QPPWG), one of the neural vocoders based on a single neural network, into a source excitation generation network and a vocal tract resonance filtering network by additionally implementing a regularization loss. Moreover, inspired by neural source filter (NSF), only a sinusoidal waveform is additionally used as the simplest clue to generate a periodic source excitation waveform while minimizing the effect of approximations in the source filter model. The experimental results demonstrate that uSFGAN outperforms conventional neural vocoders, such as QPPWG and NSF in both speech quality and pitch controllability.
翻译:我们建议对数据驱动源过滤模型采取统一的方法,使用单一神经网络来开发一个神经蒸气器,能够产生高质量的合成语音波形,同时保持源过滤器模型的灵活性,以控制其声音特征。我们提议的称为统一源过滤器基因对抗网络的网络(USFGAN)是通过将半周期平行WaveGAN(QPPWG)这一以单一神经网络为基础的神经蒸汽器之一的考虑因素来开发的,它进入一个源振动生成网络和一个声道振荡网络,通过进一步实施正常化损失来开发一个音源蒸汽过滤网络。此外,在神经源过滤器(NSF)的启发下,仅有一种正统波形作为生成定期源源激发波形的最简单线索,同时尽量减少源过滤器模型中近似效应。实验结果显示,在语音质量和声控性两方面,URF(QPWG)和NSF)都超越了常规神经蒸气器。