Text-to-speech (TTS) systems offer the opportunity to compensate for a hearing loss at the source rather than correcting for it at the receiving end. This removes limitations such as time constraints for algorithms that amplify a sound in a hearing aid and can lead to higher speech quality. We propose an algorithm that restores loudness to normal perception at a high resolution in time, frequency and level, and embed it in a TTS system that uses Tacotron2 and WaveGlow to produce individually amplified speech. Subjective evaluations of speech quality showed that the proposed algorithm led to high-quality audio with sound quality similar to original or linearly amplified speech but considerably higher speech intelligibility in noise. Transfer learning led to a quick adaptation of the produced spectra from original speech to individually amplified speech, resulted in high speech quality and intelligibility, and thus gives us a way to train an individual TTS system efficiently.
翻译:文本到语音系统(TTS)为弥补源头的听力损失提供了机会,而不是纠正接收端的听力损失提供了机会。这消除了对增强助听器声音并能够提高语音质量的算法的时间限制等限制。我们提出了一个在时间、频率和级别上将声音恢复到高分辨率正常感知的算法,并将其嵌入一个使用Tacotron2和WaveGlow制作个人扩音的TTS系统。对语言质量的主观评价表明,拟议的算法导致高质量的声音质量与原始或线性扩增的语音相似,但噪音中的语音智能程度要高得多。传输学习导致将原话的光谱迅速改制成个人扩音,导致高语音质量和智能,从而使我们能够有效地培训单个的TTS系统。