项目名称: 基于结构化统计声学模型的非平行语料非联合训练说话人语音转换研究
项目编号: No.61271360
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 无线电电子学、电信技术
项目作者: 俞一彪
作者单位: 苏州大学
项目金额: 65万元
中文摘要: 说话人语音转换在保持语义不变的前提下将源说话人的语音转换为目标说话人的语音,具有广泛的应用价值,也是当前语音处理研究领域的主要热点之一。目前,语音转换系统一般采用平行语料训练源-目标说话人联合语音模型,并由此推导语音转换公式。但是,实际应用中不仅难以得到平行语料,而且联合语音模型的训练需要语音的精确对准和大量的计算、系统扩展也相当不便。本课题旨在研究并提出一种有效的高性能非平行语料非联合训练说话人语音转换方法。主要研究内容包括:(1)说话人语音结构化统计声学模型的分析与研究;(2)说话人语音结构化统计声学模型之间的匹配与特征分布对准方法研究;(3)短时谱转换公式推导;(4)说话人语音多韵律模型与转换控制研究;(5)语音转换性能的主观与客观评价。
中文关键词: 语音转换;非平行语料;结构化高斯混合模型;全局声学结构;约束期望最大化
英文摘要: Voice conversion means convert speech of source speaker to that of target speaker. As one of the hotest research topics in speech processing, it is very significant for various applications. The most of current voice conversion system need parallel speech corpus of both source and target speaker for joint training of union speech model by which the transform function of spectrum is derived. But parallel corpus is quite difficult to get in practice, the joint traing of union speech model consumes much cumputational costs and make system inflexible for new users. This project focuses on research of innovative voice conversion technology without need of parallel speech corpus and joint training. The main contents are: (1) Structured statistical acoustic model of speaker vocie;(2) Matching and alignment of structured statistical acoustic model;(3) Transform function of speech spectrum;(4) Multi-prosody model and transform, control;(5) Objective and sunjective evaluation of trasform performance.
英文关键词: voice conversion;non-parallel;structured Gaussian mixture model;acoustic universal structure;constraint expectation maximaization