Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent. For native speakers, understanding or speaking a new accent is likewise a difficult task. An accent conversion system that changes a speaker's accent but preserves that speaker's voice identity, such as timbre and pitch, has the potential for a range of applications, such as communication, language learning, and entertainment. Existing accent conversion models tend to change the speaker identity and accent at the same time. Here, we use adversarial learning to disentangle accent dependent features while retaining other acoustic characteristics. What sets our work apart from existing accent conversion models is the capability to convert an unseen speaker's utterance to multiple accents while preserving its original voice identity. Subjective evaluations show that our model generates audio that sound closer to the target accent and like the original speaker.
翻译:试图学习外语的大多数人会很难理解或用母语口音说话。 对于母语使用者来说,理解或讲新口音同样是一项困难的任务。 改变一个发言者口音的口音转换系统,但保留该发言者的口音特性的口音转换系统,如音音音和音调,具有广泛的应用潜力,如交流、语言学习和娱乐。 现有的口音转换模式会同时改变发言者的身份和口音。 在这里,我们使用对抗性学习来解开口音依赖性特征,同时保留其他声学特征。 将我们的工作与现有的口音转换模式不同的是,在保存原声特性的同时,将看不见的口音转换为多个口音的能力。 主观评价显示,我们的口音模式产生的声音更接近目标口音和原声。