The task of speech recognition in far-field environments is adversely affected by the reverberant artifacts that elicit as the temporal smearing of the sub-band envelopes. In this paper, we develop a neural model for speech dereverberation using the long-term sub-band envelopes of speech. The sub-band envelopes are derived using frequency domain linear prediction (FDLP) which performs an autoregressive estimation of the Hilbert envelopes. The neural dereverberation model estimates the envelope gain which when applied to reverberant signals suppresses the late reflection components in the far-field signal. The dereverberated envelopes are used for feature extraction in speech recognition. Further, the sequence of steps involved in envelope dereverberation, feature extraction and acoustic modeling for ASR can be implemented as a single neural processing pipeline which allows the joint learning of the dereverberation network and the acoustic model. Several experiments are performed on the REVERB challenge dataset, CHiME-3 dataset and VOiCES dataset. In these experiments, the joint learning of envelope dereverberation and acoustic model yields significant performance improvements over the baseline ASR system based on log-mel spectrogram as well as other past approaches for dereverberation (average relative improvements of 10-24% over the baseline system). A detailed analysis on the choice of hyper-parameters and the cost function involved in envelope dereverberation is also provided.
翻译:远处环境中语音识别的任务受到作为子频带信封临时涂抹而出现的反动人工制品的不利影响。 在本文中,我们利用长期次频段语音信封开发了语音偏移神经模型。 亚频频带信封是利用频域域线性预测(FDLP)产生的,该预测对Hilbert信封进行自动递增估计。 神经权位偏差模型估计了用于反动信号时会抑制远域信号中晚反射组件的封套增益。 皮肤错位信封用于语音识别的特征提取。 此外, 用于自动语音信封的脱动、特征提取和声学建模的步骤序列可以作为单一的神经域线性处理管道来实施,从而能够共同学习对Hilbert信封的自动反射网和声学模型。 在REWERB挑战数据集、CHimeME-3数据套和VoiCES数据集中进行若干实验。 在这些实验中,在语音信号中联合学习信封 derberation 和声波式信封式信封缩 24号用于语音感应变分析,作为过去基准线性模型分析的一部分。