说话人识别中时变鲁棒的声纹特征研究

项目名称： 说话人识别中时变鲁棒的声纹特征研究

项目编号： No.61271389

项目类型： 面上项目

立项/批准年度： 2013

项目学科： 无线电电子学、电信技术

项目作者： 郑方

作者单位： 清华大学

项目金额： 80万元

中文摘要： 说话人识别应用广泛，对于公共安全和国防安全等都有重要的战略意义。随着时间的推移，人的声纹会发生变化，从而严重影响说话人识别的精度，这就是声纹的时变现象。本项目针对这一现象，从声纹特征入手，研究说话人识别的时变鲁棒性问题。项目拟建设一个支持深入研究声纹时变性的语音数据库。在此数据库基础上采用数据驱动的方法，参照F比率的思想，探索人类语音基于频带能量的参数和基于声道模型短管截面积比的参数在说话人个体的区分度和概率分布稳定性上的规律，研究用于说话人识别的时变鲁棒性准则的计算公式；结合发声机理和听觉机理，通过短管合并、频率弯折、幅度加权等方式修改语音特征的计算方法，得到时变鲁棒的声纹特征提取算法；研究不同声纹特征时变鲁棒性优劣的判别准则，以指导声纹特征的选取与融合；构建原型系统，对所研究的声纹特征提取算法的正确性和有效性进行验证。

中文关键词： 说话人识别；时变鲁棒性；特征提取；；

英文摘要： Speaker recognition, also known as voiceprint recognition, can be widely used in many areas and has a strategic significance for both public security and national defense security. The voiceprint of a speaker changes with time, which is called the time-varying phenomenon of voiceprint. In this project, the voiceprint features are studied to address this issue and improve the time-varying robustness of speaker recognition technologies. A voiceprint database, specific for in-depth study on the time-varying issue, will be created. By using the F-ratio idea and the data driven methodology, effects of parameters based on frequency band energies and area ratios of adjacent tubes in the vocal tract model, on discrimination of speaker-specific information and stability of its probability distribution will be explored, and furthermore, a formula to calculate the degree of time-varying robustness in speaker recognition will be proposed. Various modification methods will be tested on feature calculation including tube merging, frequency warping, and amplitude weighting, combined with the mechanism of speech production and perception of humans. Also, a criterion to determine degree of time-varying robustness in the voiceprint features level will be proposed to guide feature selection and fusion. Finally a prototype system w

英文关键词： speaker recognition；long-term speaker variability；feature extration；；

成为VIP会员查看完整内容