项目名称: 用于非对称语料的语音转换函数训练算法研究
项目编号: No.61201301
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 电子学与信息系统
项目作者: 简志华
作者单位: 杭州电子科技大学
项目金额: 24万元
中文摘要: 语音转换是要改变一个说话人语音中的个性特征信息,使之具有另外一个人的个性信息,转换后的语音听起来就像是目标说话人的声音一样,而语音中的其它信息保持不变。本项目以非对称语料情况下的语音转换函数训练算法为研究内容,具体来说,主要内容有:第一、利用高斯混合模型分别对源、目标语音的特征参数进行音素分类;第二、在音素分类的基础上,利用KL距离对源、目标语音相同或相近的音素类进行匹配;第三、在源、目标语音相对应的音素类内,根据声学距离最近原则将两者的特征参数序列进行对齐,并由此训练出语音转换函数;第四、根据人耳的听觉特性,研究语音信号韵律特性的转换。探索高质量、有效的语音转换算法和实现具有较强实用价值的转换系统是本项目的研究目标。由于语音转换是语音处理领域一项新兴的技术,它涉及的理论广,运用价值大,因此,本项目的研究具有重要的理论意义和实用价值。
中文关键词: 语音转换;非对称语料;帧间动态信息;小样本训练数据;高斯混合模型
英文摘要: The goal of voice conversion is to modify the speech signal of source speaker to be perceived as if it had been uttered by a target speaker, but not altering semantic context. In this proposal, we aim to research on the training algorithm of voice conversion for non-parallel corpora. More specifically, our research mainly focus on four aspects. Firstly, Gaussian mixture model (GMM) is to be used for the phoneme classification of the source speech and the target one respectively. Secondly, in order to find the corresponding phoneme's category, we matches each individual Gaussian components of the GMM from source speaker to target speaker and vice versa according to Kullback-Leibler (KL) distance based on the results of phoneme classification. Thirdly, our proposal performs the frame alignment of phonetically equivalent acoustic vectors for source and target speaker in their mapped sub-spaces, not in the whole space. And then, the frame-aligned feature vectors are used to train the conversion function. Finally, according to auditory characteristics, prosody modification is conducted. In summary, our research's goal is to study the training algorithm on voice conversion with high quality converted speeches and good similarity between converted and target speeches. Voice conversion is a new technology which covers a
英文关键词: voice conversion;non-parallel corpus;inter-frame dynamic information;limited training data;Gaussian mixture model