We present a method to remove unknown convolutive noise introduced to speech by reverberations of recording environments, utilizing some amount of training speech data from the reverberant environment, and any available non-reverberant speech data. Using Fourier transform computed over long temporal windows, which ideally cover the entire room impulse response, we convert room induced convolution to additions in the log spectral domain. Next, we compute a spectral normalization vector from statistics gathered over reverberated as well as over clean speech in the log spectral domain. During operation, this normalization vectors are used to alleviate reverberations from complex speech spectra recorded under the same reverberant conditions . Such dereverberated complex speech spectra are used to compute complex FDLP-spectrograms for use in automatic speech recognition.
翻译:我们提出了一个方法,通过对记录环境进行反动,利用来自回旋环境的一定数量的训练语言数据,以及任何现有的非反动语言数据,消除在讲话中引入的未知的共振噪音。使用长时间窗口计算出来的傅里叶变换,最好是覆盖整个房间的脉冲反应,我们将房间引发的变异转化为日志光谱域中的额外变化。接下来,我们从在日志光谱域中通过反动和清洁语言收集的统计数据中计算出一个光谱正常化矢量。在操作过程中,使用这种正常化矢量来缓解在同一回动条件下记录的复杂语音光谱中产生的反动。这种变异的复杂语言光谱被用于计算复杂的FDLP-光谱,用于自动语音识别。