三M:一个具有多指导关注和多波段多时长长网的实用神经文本到语音系统 (Triple M: A Practical Neural Text-to-speech System With Multi-guidance Attention And Multi-band Multi-time Lpcnet)

Although the sequence-to-sequence network with attention mechanism and neural vocoder has made great progress in the quality of speech synthesis, there are still some problems to be solved in large-scale real-time applications. For example, to avoid long sentence alignment failure while maintaining rich prosody, and to reduce the computational overhead while ensuring perceptual quality. In order to address these issues, we propose a practical neural text-to-speech system, named Triple M, consisting of a seq2seq model with multi-guidance attention and a multi-band multi-time LPCNet. The former uses alignment results of different attention mechanisms to guide the learning of the basic attention mechanism, and only retains the basic attention mechanism during inference. This approach can improve the performance of the text-to-feature module by absorbing the advantages of all guidance attention methods without modifying the basic inference architecture. The latter reduces the computational complexity of LPCNet through combining multi-band and multi-time strategies. The multi-band strategy enables the LPCNet to generate sub-band signals in each inference. By predicting the sub-band signals of adjacent time in one forward operation, the multi-time strategy further decreases the number of inferences required. Due to the multi-band and multi-time strategy, the vocoder speed is increased by 2.75x on a single CPU and the MOS (mean opinion score) degradation is slight.

翻译：虽然配有关注机制和神经电动读数的顺序到顺序网络在语音合成质量方面取得了巨大进展,但在大规模实时应用程序中仍有一些问题需要解决,例如,为了避免长期的句式调整失败,同时保持丰富的滚动状态,减少计算间接费用,同时确保感知质量;为解决这些问题,我们提议了一个名为Triple M 的实用神经文本到语音系统,由具有多指导关注和多波段多时LPCNet的后继2当量模型组成。前者使用不同关注机制的调整结果指导基本关注机制的学习,只在推断过程中保留基本关注机制。这种方法可以通过吸收所有指导关注方法的优点,同时不改变基本的推导力结构,从而改进文本到功能模块的性能。后者通过将多波段和多时战略结合起来,降低LPCNet的计算复杂性。多波段战略使得LPC网络在每次测试基本关注机制时使用不同关注机制的调整结果,仅保留基本关注机制的基本关注机制,并在推断过程中保留基本关注机制。这种方法可以改进文本到功能模块模块模块的性功能,通过吸收所有指导注意方法的优点,从而进一步预测多波段递递增速度战略的多波段递递递递递增速度。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

联邦学习安全与隐私保护综述

专知会员服务

113+阅读 · 2020年11月16日

神经机器翻译前沿综述

专知会员服务

28+阅读 · 2020年9月9日

深度神经网络模型压缩综述

专知会员服务

116+阅读 · 2020年8月22日

【IJCAJ 2019】多视角知识图谱嵌入的实体对齐，Multi-view Knowledge Graph Embedding for Entity Alignment

专知会员服务

59+阅读 · 2020年6月30日