The optimization of a wavelet-based algorithm to improve speech intelligibility along with the full data set and results are reported. The discrete-time speech signal is split into frequency sub-bands via a multi-level discrete wavelet transform. Various gains are applied to the sub-band signals before they are recombined to form a modified version of the speech. The sub-band gains are adjusted while keeping the overall signal energy unchanged, and the speech intelligibility under various background interference and simulated hearing loss conditions is enhanced and evaluated objectively and quantitatively using Google Speech-to-Text transcription. A universal set of sub-band gains can work over a range of noise-to-signal ratios up to 4.8 dB. For noise-free speech, overall intelligibility is improved, and the Google transcription accuracy is increased by 16.9 percentage points on average and 86.7 maximum by reallocating the spectral energy toward the mid-frequency sub-bands. For speech already corrupted by noise, improving intelligibility is challenging but still realizable with an increased transcription accuracy of 9.5 percentage points on average and 71.4 maximum. The proposed algorithm is implementable for real-time speech processing and comparatively simpler than previous algorithms. Potential applications include speech enhancement, hearing aids, machine listening, and a better understanding of speech intelligibility.
翻译:通过多层次离散波子变换,将离散时间语音信号分成频率子波段。在对子波段信号进行重新组合以形成演讲的修改版本之前,对子波段信号应用了各种收益。子波段增益进行了调整,同时保持了总体信号能量不变,在各种背景干扰和模拟听力损失条件下的语音可感知性得到了加强,并利用谷歌语音对文本抄录,客观和定量地评估了这些声音的可感性。一套通用的子波段增益可以通过多层次离散的离散波子波段转换成频率子波段。对于无噪音语音而言,总体可感知性得到改进,而谷歌调频谱精度的准确度通过将光谱能量与中频子频段相匹配,平均增加16.9个百分点,最大增加86.7个百分点。对于已经因噪音而变坏的言论,提高智能度是具有挑战性的,但是仍然可以实现的。对于平均和71.4级语音评分率比平均9.5个百分点更高。关于无噪音和最大程度的语音感应变能力的拟议演算法应用程序应用包括了更简单的升级的语音演算能力,改进了先前的语音处理。