We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions. Instead of carrying out point estimation in conventional maximum a posteriori estimation with a risk of having a curse of dimensionality in estimating a huge number of model parameters, we focus our attention on estimating a manageable number of latent variables of DNNs via a VB inference framework. To accomplish model transfer, knowledge learnt from a source domain is encoded in prior distributions of latent variables and optimally combined, in a Bayesian sense, with a small set of adaptation data from a target domain to approximate the corresponding posterior distributions. Experimental results on device adaptation in acoustic scene classification show that our proposed VB approach can obtain good improvements on target devices, and consistently outperforms 13 state-of-the-art knowledge transfer algorithms.
翻译:我们建议采用变式贝叶西亚(VB)方法来学习深神经网络中潜在变量分布的跨域知识转让模型,以解决培训和测试条件之间的声学不匹配问题。我们建议采用变式贝叶西亚(VB)方法来学习深神经网络(DNN)模型中潜在变量的分布,以解决培训和测试条件之间的声学不匹配问题。我们没有在常规的事后最大估计中进行点数估计,在估计大量模型参数时有可能受到维度的诅咒,而是集中关注通过VB推理框架来估计DN的可控潜在变量数量。为了实现模型转移,从源域中获取的知识在先前的潜在变量分布中进行了编码,在巴耶西亚意义上,最佳地结合了一组从目标域到相应远地点分布的适应数据。 声学场分类中设备调整的实验结果显示,我们提议的VB方法可以在目标设备上取得良好的改进,并始终超越13个最先进的知识转让算法。