We describe our approach to create and deliver a custom voice for a conversational AI use-case. More specifically, we provide a voice for a Digital Einstein character, to enable human-computer interaction within the digital conversation experience. To create the voice which fits the context well, we first design a voice character and we produce the recordings which correspond to the desired speech attributes. We then model the voice. Our solution utilizes Fastspeech 2 for log-scaled mel-spectrogram prediction from phonemes and Parallel WaveGAN to generate the waveforms. The system supports a character input and gives a speech waveform at the output. We use a custom dictionary for selected words to ensure their proper pronunciation. Our proposed cloud architecture enables for fast voice delivery, making it possible to talk to the digital version of Albert Einstein in real-time.
翻译:我们描述我们为对话的 AI 使用大小写创建和提供自定义声音的方法。 更具体地说, 我们为数字爱因斯坦字符提供一个声音, 以便在数字对话经历中实现人- 计算机互动。 为了创建符合上下文的声音, 我们首先设计一个声音字符, 并制作符合想要的语音属性的录音。 然后我们模拟这个声音。 我们的解决方案使用快速语音 2 来从电话和平行WaveGAN 生成波形。 系统支持一个字符输入, 并在输出时提供语音波形。 我们使用一个选定词的自定义字典来确保其适当的发音。 我们提议的云结构可以快速发送语音, 使得能够实时与阿尔伯特 爱因斯坦 的数字版本交谈 。