Bayesian 深神经网络适应学习 (Bayesian Learning for Deep Neural Network Adaptation)

A key task for speech recognition systems is to reduce the mismatch between training and evaluation data that is often attributable to speaker differences. Speaker adaptation techniques play a vital role to reduce the mismatch. Model-based speaker adaptation approaches often require sufficient amounts of target speaker data to ensure robustness. When the amount of speaker level data is limited, speaker adaptation is prone to overfitting and poor generalization. To address the issue, this paper proposes a full Bayesian learning based DNN speaker adaptation framework to model speaker-dependent (SD) parameter uncertainty given limited speaker specific adaptation data. This framework is investigated in three forms of model based DNN adaptation techniques: Bayesian learning of hidden unit contributions (BLHUC), Bayesian parameterized activation functions (BPAct), and Bayesian hidden unit bias vectors (BHUB). In the three methods, deterministic SD parameters are replaced by latent variable posterior distributions for each speaker, whose parameters are efficiently estimated using a variational inference based approach. Experiments conducted on 300-hour speed perturbed Switchboard corpus trained LF-MMI TDNN/CNN-TDNN systems suggest the proposed Bayesian adaptation approaches consistently outperform the deterministic adaptation on the NIST Hub5'00 and RT03 evaluation sets. When using only the first five utterances from each speaker as adaptation data, significant word error rate reductions up to 1.4% absolute (7.2% relative) were obtained on the CallHome subset. The efficacy of the proposed Bayesian adaptation techniques is further demonstrated in a comparison against the state-of-the-art performance obtained on the same task using the most recent systems reported in the literature.

翻译：语音识别系统的一项关键任务是减少培训与评价数据之间的不匹配,这种不匹配往往归因于演讲者的差异; 演讲者适应技术对于减少不匹配性起着关键作用; 示范性演讲者适应方法往往需要足够数量的目标演讲者数据以确保稳健性。当演讲者一级的数据数量有限时,演讲者适应性容易过大,而且一般化程度差。为了解决这个问题,本文件提议一个完全基于Bayesian学习的基于DNN 的DN 演讲者参数调适框架,以适用于以演讲者为主的模范(SD)参数不确定性,因为演讲者的具体适应数据有限。这个框架以三种模式为基础的DNNNN适应技术:Bayesian学习隐藏单位贡献(BLHUC)、BESian参数化激活功能(BPAcase)和Bayesian最近隐藏单位偏差矢量矢量值矢量。在三种方法中,威慑性自定义性自定义的自定义自定义的自定义的自定义自定义自定义自定义自定义自定义自定义自定义自定义的自定义自定义自定义自定义自定义自定义自定义自定义自定义自定义自定义的自定义自定义的自定义自定义自定义自定义的自定义自定义的自定义的自定义自定义自定义的系统系统系统,仅自定义的自定义的自定义自定义的自定义的自定义自定义的自定义的自定义的自定义自定义的自定义的自定义的自定义自定义的自定义的自定义的自定义的自定义的自定义的自定义的自定义自定义自定义的自制的自定义的自制的自定义的自制的自制式路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路路段路段路路路段)系统系统系统路路路路路路路路路路路段路段路段路路路路路路路路路路路路路路路路路路路由路段路路路路路路路路路路路路路路路路路路路路路路路路