灯光语音:轻量级和快速短信,与神经结构搜索的语音 (LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search)

Text to speech (TTS) has been broadly used to synthesize natural and intelligible speech in different scenarios. Deploying TTS in various end devices such as mobile phones or embedded devices requires extremely small memory usage and inference latency. While non-autoregressive TTS models such as FastSpeech have achieved significantly faster inference speed than autoregressive models, their model size and inference latency are still large for the deployment in resource constrained devices. In this paper, we propose LightSpeech, which leverages neural architecture search~(NAS) to automatically design more lightweight and efficient models based on FastSpeech. We first profile the components of current FastSpeech model and carefully design a novel search space containing various lightweight and potentially effective architectures. Then NAS is utilized to automatically discover well performing architectures within the search space. Experiments show that the model discovered by our method achieves 15x model compression ratio and 6.5x inference speedup on CPU with on par voice quality. Audio demos are provided at https://speechresearch.github.io/lightspeech.

翻译：语音文本( TTS) 已被广泛用于在不同情况下合成自然和可理解的语音。在移动电话或嵌入装置等各种终端设备中部署 TTS 需要极小的内存使用和推断时间。 FastSpeech 等非潜移 TTS 模型比自动递增模型的推导速度要快得多, 其模型大小和推导时间长度对于在资源受限设备中部署而言仍然很大。在本文中, 我们提议 LightSpeech, 它可以利用神经结构搜索~ (NAS) 来自动设计更轻、更高效的基于快速语音的模型。我们首先描述当前快速语音模型的组件, 并仔细设计包含各种光量和潜在有效结构的新搜索空间。然后, NAS 被用于自动发现搜索空间内运行良好的结构。实验显示, 我们方法发现的模型达到15x 模型压缩比率和 6.5x 引用速度, 以微语音质量提供音频演示。 https:// speechrestrearsearch.gio/ lightspech. lightschech. 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

深度学习搜索，Exploring Deep Learning for Search

专知会员服务

61+阅读 · 2020年5月9日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

【InterSpeech2020】混合语音识别系统中的词汇扩展技术，Techniques for Vocabulary Expansion in Hybrid Speech Recognition Systems

专知会员服务

17+阅读 · 2020年3月23日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日