Recently, non-autoregressive neural vocoders have provided remarkable performance in generating high-fidelity speech and have been able to produce synthetic speech in real-time. However, non-autoregressive neural vocoders such as WaveGlow are far behind autoregressive neural vocoders like WaveFlow in terms of modeling audio signals due to their limitation in expressiveness. In addition, though NanoFlow is a state-of-the-art autoregressive neural vocoder that has immensely small parameters, its performance is marginally lower than WaveFlow. Therefore, in this paper, we propose a new type of autoregressive neural vocoder called FlowVocoder, which has a small memory footprint and is able to generate high-fidelity audio in real-time. Our proposed model improves the expressiveness of flow blocks by operating a mixture of Cumulative Distribution Function(CDF) for bipartite transformation. Hence, the proposed model is capable of modeling waveform signals as well as WaveFlow, while its memory footprint is much smaller thanWaveFlow. As shown in experiments, FlowVocoder achieves competitive results with baseline methods in terms of both subjective and objective evaluation, also, it is more suitable for real-time text-to-speech applications.
翻译:最近,非潜降神经蒸汽器在制作高度不忠言词方面表现显著,并且能够实时制作合成言辞。然而,WaveGlow等非潜化神经立体神经立体器在制作音频信号模型方面远远落后于WaveFlowFlow这样的自动反向神经立体立体器。此外,虽然NanoFlow是一个拥有极小参数的先进自动递增神经立体电动器,但其性能略低于WaveFlow。因此,我们在本文件中提出了一种新型的自动递增神经立体神经立体神经立体神经立体电动器,称为WlowGlowGlow,它具有小的记忆足迹,并且能够实时生成高偏移音频音频音频。我们提议的模型通过运行一种混合的累积分布函数(CDF)进行双向变形变形,提高了流流的清晰度。因此,拟议的模型能够模拟波形信号,也比WaveFlowlowlow略低一点。因此,我们提出了一种新型的自动神经神经神经立体神经立体神经,而其真实的记忆足迹在现实的实验中也比WaveFlowFloveFlove-dFlove-lat 都更具有竞争力。