We investigate the potential of stochastic neural networks for learning effective waveform-based acoustic models. The waveform-based setting, inherent to fully end-to-end speech recognition systems, is motivated by several comparative studies of automatic and human speech recognition that associate standard non-adaptive feature extraction techniques with information loss which can adversely affect robustness. Stochastic neural networks, on the other hand, are a class of models capable of incorporating rich regularization mechanisms into the learning process. We consider a deep convolutional neural network that first decomposes speech into frequency sub-bands via an adaptive parametric convolutional block where filters are specified by cosine modulations of compactly supported windows. The network then employs standard non-parametric 1D convolutions to extract relevant spectro-temporal patterns while gradually compressing the structured high dimensional representation generated by the parametric block. We rely on a probabilistic parametrization of the proposed neural architecture and learn the model using stochastic variational inference. This requires evaluation of an analytically intractable integral defining the Kullback-Leibler divergence term responsible for regularization, for which we propose an effective approximation based on the Gauss-Hermite quadrature. Our empirical results demonstrate a superior performance of the proposed approach over comparable waveform-based baselines and indicate that it could lead to robustness. Moreover, the approach outperforms a recently proposed deep convolutional neural network for learning of robust acoustic models with standard FBANK features.
翻译:我们调查了沙沙神经网络的潜力,以学习有效的波形声学模型。波形环境是完全端至端语音识别系统的内在固有特征,其动力是自动和人的语音识别比较研究,将标准非适应性特征提取技术与可能对稳健性产生不利影响的信息丢失联系起来。另一方面,沙沙神经网络是能够将丰富的正规化机制纳入学习过程的一组模型。我们考虑的是深层神经网络,它首先通过一个适应性准声带将语音分带分带分解为频率分带,过滤器由紧固支持的窗口的 Cosine 调制来指定。然后,这个网络使用标准的非参数提取技术,将相关的光谱-时尚模式联系起来,同时逐渐压缩拟议中的高维度代表结构。我们依赖一个稳定的神经结构结构的匹配,并且用精确变异的推法来学习该模型。这需要评估一个可分析的稳妥性内置的内置式内置内置内置式内置内置内置内置式的内置内置内置内置内置系统,以显示我们高压的内置的内置性模型。