Recent developments using End-to-End Deep Learning models have been shown to have near or better performance than state of the art Recurrent Neural Networks (RNNs) on Automatic Speech Recognition tasks. These models tend to be lighter weight and require less training time than traditional RNN-based approaches. However, these models take frequentist approach to weight training. In theory, network weights are drawn from a latent, intractable probability distribution. We introduce BayesSpeech for end-to-end Automatic Speech Recognition. BayesSpeech is a Bayesian Transformer Network where these intractable posteriors are learned through variational inference and the local reparameterization trick without recurrence. We show how the introduction of variance in the weights leads to faster training time and near state-of-the-art performance on LibriSpeech-960.
翻译:使用端到端深层学习模式的近期发展显示,与关于自动语音识别任务的现代神经网络(神经网络)相比,这些模式的性能接近或好于最新状态。这些模式的重量往往较轻,比传统的RNN方法需要的训练时间较少。然而,这些模式在加权培训方面采取经常做法。理论上,网络权重是从潜在、棘手的概率分布中抽取的。我们引入了端到端自动语音识别的BayesSpeech。BayesSpeech是一个Bayesian变异器网络,通过变异推论和不重复的局部再校准技巧学习这些棘手的后裔。我们展示了重量差异如何导致更快的培训时间和利布里Speech-960上最先进的表现。