The optimization of a wavelet-based algorithm to improve speech intelligibility is reported. The discrete-time speech signal is split into frequency sub-bands via a multi-level discrete wavelet transform. Various gains are applied to the sub-band signals before they are recombined to form a modified version of the speech. The sub-band gains are adjusted while keeping the overall signal energy unchanged, and the speech intelligibility under various background interference and simulated hearing loss conditions is enhanced and evaluated objectively and quantitatively using Google Speech-to-Text transcription. For English and Chinese noise-free speech, overall intelligibility is improved, and the transcription accuracy can be increased by as much as 80 percentage points by reallocating the spectral energy toward the mid-frequency sub-bands, effectively increasing the consonant-vowel intensity ratio. This is reasonable since the consonants are relatively weak and of short duration, which are therefore the most likely to become indistinguishable in the presence of background noise or high-frequency hearing impairment. For speech already corrupted by noise, improving intelligibility is challenging but still realizable. The proposed algorithm is implementable for real-time signal processing and comparatively simpler than previous algorithms. Potential applications include speech enhancement, hearing aids, machine listening, and a better understanding of speech intelligibility.
翻译:报告了以波子为基础的算法的优化,以改善语音感知性。离散时间语音信号通过多级离散波子变换,分成频率子波段。在对子波段信号进行重新组合以形成语音的修改版本之前,对亚波段信号应用了各种收益。对亚波段增益进行了调整,同时保持了总体信号能量不变,在各种背景干扰和模拟听力损失条件下的语音感知性利用谷歌语音对文本的抄录,提高并进行了客观和定量的评估。对于英语和中国无噪声语音而言,总体智能性得到提高,通过将光谱能量重新定位到中频子波段,可以提高高达80个百分点,从而有效地提高调频频频率强度比率。这是合理的,因为在各种背景干扰和模拟听力损失条件下,调音感应相对薄弱,因此最有可能变得不可调低。对于由于噪音已经腐败的言论而言,提高智能是具有挑战性的,但是仍然具有挑战性的,而笔录准确性精确度的准确度准确度精确度可以提高。提议的语音感官演算法应用是比先前更简单的听力和感官听力的演力演算法。拟议的演算法是更精确的更精确的演化的演算法,在前演变能力的演化的演化演化的演算法的演化的演算法是比较的演变能力。