Text-to-speech (TTS) offers the opportunity to compensate for a hearing loss at the source rather than correcting for it at the receiving end. This removes limitations such as time constraints for algorithms that amplify a sound individually and can lead to higher speech quality for hearing-impaired listeners. We propose an algorithm that restores loudness to normal perception at a high resolution in time, frequency and level, and embed it in a TTS system that uses Tacotron2 and WaveGlow to produce individually amplified speech. Subjective evaluations of speech quality showed that the proposed algorithm led to high-quality audio. Mean opinion scores were predicted well by the STOI metric. Transfer learning led to a quick adaption of the produced spectra from original speech to individually amplified speech and gives us a way to train an individual TTS system efficiently.
翻译:文本到语音(TTS) 提供了补偿源头听力损失而不是在接收端纠正听力损失的机会。 这消除了诸如个人扩音并能够提高听力障碍听众语言质量的算法时间限制等限制。 我们提出了一种在时间、频率和级别上恢复声音到高分辨率的正常感知的算法,并将其嵌入一个使用Tacotron2和WaveGlow来制作个人扩音的TTS系统。 对语言质量的主观评价表明,拟议的算法导致高质量的音频。 STOI 衡量标准预测了平均意见分数。 传输学习导致将原话的光谱迅速适应到个人扩音,并给我们一个高效培训个人TTS系统的方法。