The Mandarin Chinese language is known to be strongly influenced by a rich set of regional accents, while Mandarin speech with each accent is quite low resource. Hence, an important task in Mandarin speech recognition is to appropriately model the acoustic variabilities imposed by accents. In this paper, an investigation of implicit and explicit use of accent information on a range of deep neural network (DNN) based acoustic modelling techniques is conducted. Meanwhile, approaches of multi-accent modelling including multi-style training, multi-accent decision tree state tying, DNN tandem and multi-level adaptive network (MLAN) tandem hidden Markov model (HMM) modelling are combined and compared in this paper. On a low resource accented Mandarin speech recognition task consisting of four regional accents, an improved MLAN tandem HMM systems explicitly leveraging the accent information was proposed and significantly outperformed the baseline accent independent DNN tandem systems by 0.8%-1.5% absolute (6%-9% relative) in character error rate after sequence level discriminative training and adaptation.
翻译:据知,中中文受到大量地区口音的强烈影响,而每个口音的普通口音则相当低。因此,汉语语音识别的一项重要任务是对口音带来的声变异性进行适当模型。在本文中,对一系列深神经网络(DNN)的声学建模技术的隐含和明确使用口音信息进行了调查。同时,对多级建模方法,包括多式培训、多级决定树型搭配、DNN和多级适应网络(MLAN)双向隐藏的Markov模型(HMM)进行合并和比较。关于由四个区域口音组成的低资源集中的汉语语音识别任务,提出了明确利用口音信息的改进的MLAN组合系统,大大超过基线独立DNN连带系统0.8%-1.5%的绝对值(6%-9%相对值),在顺序级歧视培训和适应后,在性差率方面,将字符偏差率提高0.8%-1.5%(6%-9%)。