项目名称: 基于发音特征的汉语语音识别分层解码方法研究
项目编号: No.61503382
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 杨占磊
作者单位: 中国科学院自动化研究所
项目金额: 22万元
中文摘要: 传统语音识别解码方法对候选路径在整个搜索空间无区别地进行扩展,缺乏利用辅助信息对搜索空间进行划分,无法根据子空间的有希望程度加强或裁剪搜索,导致解码中存在大量不必要的计算。此外,传统解码方法也缺乏利用辅助信息对候选路径的正确性进行评估,导致无法在解码过程中调整路径扩展的方向。本研究拟在多层次的声学线索及语义线索的基础上,结合汉语普通话发音特征体系,探索发音特征这一辅助信息的自动提取方法,从语音产生的角度为语音提供更稳定的表征。在此基础上,建立发音模型,探索基于发音信息的双层解码方法。第一层解码利用发音模型划分搜索空间,第二层解码利用声学模型在希望程度不同的搜索子空间中对候选路径进行扩展。此外,本研究拟探索基于发音信息的候选路径评估方法。对候选路径进行评估的基础上,利用评估结果及时调整解码时候选路径的扩展方向,研究与人脑利用启发式线索对候选假设进行评估这一认知过程相符合的语音识别方法。
中文关键词: 解码算法;搜索空间;特征提取;声学模型;发音特征
英文摘要: Traditional automatic speech recognition (ASR) decoding methods extend path candidates nondistinctively in the whole search space, ignoring a usage of assistant information for search space dividing, consequently incapable of enhancing or pruning the search according to promising level of subspaces, which courses lots of unnecessary calculation. Besides, traditional decoding methods lack an assessment of confidence of path candidates by using assistant information, thus unable to adjust direction of extension in the decoding process. On the basis of multiple acoustic and semantic cues, as well as articulatory feature (AF) framework of Mandarin speech, this study is going to explore automatic AF extraction method. As a kind of assistant information, AF provides a stable representation of speech from the point of speech production. Then, this study intends to explore AF modeling method, as well as two-level decoding method by integrating articulatory information. The first level decoding takes advantage of articulatory model to divide the search space into several subspaces, and the second level decoding takes advantage of acoustic model to extend path candidates in the resulting subspaces according to their degree of promising. Furthermore, this study is going to explore articulatory information based assessment method of path candidates. After assessing path candidates, the result of assessment is integrated into decoding process to induce the extension direction of path candidates, which gives rise to a novel ASR method that conforms to the cognitive process of human brain that assess candidate hypothesis by using heuristic cues.
英文关键词: decoding algorithm;search space;feature extraction;acoustic model;articulatory feature