项目名称: 基于音节模型的音频点播关键技术研究
项目编号: No.61301218
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 无线电电子学、电信技术
项目作者: 吕勇
作者单位: 河海大学
项目金额: 23万元
中文摘要: 针对汉语同音字多,音节较少, 多个汉字对应一个音节的特点,为每个汉语音节建立音频索引库,将用户发出的口语识别为音节序列。在匹配解码阶段,首先根据输入语音的音节序列信息,从音频索引库的相应音节条目中选取候选音频,再将输入语音的音节序列与候选音频的音节序列进行匹配解码。用音节序列匹配取代传统的文本匹配,提高了解码精度,降低了系统复杂度。在前端语音识别中,用非线性环境补偿技术对加性噪声、信道失真和室内混响进行补偿,提高语音识别的鲁棒性;并采用N-best算法选取前N个最有可能的语音单元作为输出结果,得到待识别语音的多个可能的音节序列,从而减小前端语音识别错误对后端音节序列匹配解码的影响。
中文关键词: 音频点播;音节模型;语音识别;环境补偿;混响语音处理
英文摘要: In Chinese, there are a large number of homophones and a small number of syllables, and more than one Chinese character corresponds to the same syllable. Therefore, the audio indexing library is established for every Chinese syllable and the input speech is recognized as a syllable sequence in this project. In the syllable matching procedure, the potential audio tracks are selected from the audio indexing library according to the syllable information of the input speech and then the syllable sequence of the input speech is compared with the syllable sequence of every potential audio track. The traditional text matching is replaced by the syllable sequence matching, which improves the decoding accuracy and reduces the system complexity. For the front-end speech recognition procedure, the nonlinear compensation technology is employed to compensate the additive noise, channel distortion and room reverberation, which can improve the robustness of speech recognition systems. Furthermore, the N-best algorithm is used to produce more than one potential syllable sequence of the input speech, which reduces the impact of the wrong speech recognition results and improves the accuracy of the syllable sequence decoding.
英文关键词: Audio-on-demand;Syllable model;Speech recognition;Environment compensation;Reverberant speech processing