Bayesian estimation of short-time spectral amplitude is one of the most predominant approaches for the enhancement of the noise corrupted speech. The performance of these estimators are usually significantly improved when any perceptually relevant cost function is considered. On the other hand, the recent progress in the phase-based speech signal processing have shown that the phase-only enhancement based on spectral phase estimation methods can also provide joint improvement in the perceived speech quality and intelligibility, even in low SNR conditions. In this paper, to take advantage of both the perceptually motivated cost function involving STSAs of estimated and true clean speech and utilizing the prior spectral phase information, we have derived a phase-aware Bayesian STSA estimator. The parameters of the cost function are chosen based on the characteristics of the human auditory system, namely, the dynamic compressive nonlinearity of the cochlea, the perceived loudness theory and the simultaneous masking properties of the ear. This type of parameter selection scheme results in more noise reduction while limiting the speech distortion. The derived STSA estimator is optimal in the MMSE sense if the prior phase information is available. In practice, however, typically only an estimate of the clean speech phase can be obtained via employing different types of spectral phase estimation techniques which have been developed throughout the last few years. In a blind setup, we have evaluated the proposed Bayesian STSA estimator with different types of standard phase estimation methods available in the literature. Experimental results have shown that the proposed estimator can achieve substantial improvement in performance than the traditional phase-blind approaches.
翻译:对短时间光谱放大的贝叶斯估计是加强噪音腐蚀性言语的最主要方法之一。当考虑任何感知相关的成本功能时,这些估计器的性能通常会得到显著改善。另一方面,基于阶段的语音信号处理最近的进展表明,基于光谱阶段估计方法的仅阶段性增强也可以联合改善所觉察到的语音质量和智能性能,即使在低SNR条件下也是如此。在本文中,为了利用由STSA提供的关于估计和真正干净的言语以及利用先前光谱阶段信息的感知性成本功能,我们得出了一种分阶段的Bayesator STSA 估算器的性能。根据人类听觉系统的特点选择成本功能的参数,即:Cochlea的动态压缩不线性、感知的响度理论以及同时遮掩耳功能。这种参数选择方法在限制语音扭曲的同时会减少声音。在Starsima STSA 的测测算器中,在可选取的Bayal-MS-E-SA 阶段中,只能以最优的方式使用先前阶段的测算方法,如果在可理解的SBayal-SAL-SAL-SA中,那么,那么,那么, 级的测测测测得的测得的测算方法在最后阶段的测测得的SAL-SAL-S-S-S-S-S-S-S-SAL-S-S-s-s-s-s-s-vicle-vical-s-vi 级方法在前阶段,在前阶段,在可具有最优性能在前阶段的测得的测得的测得的测取的测得的测得的测得的测得的测取到的是,在前阶段,在前阶段,在前阶段,在前阶段,在前阶段,在前阶段,在前的测得的测到的测到的测得的测得的等级的测得的等级的测得的测得的测得的阶段只有-vical-vical-SAL-s-s-s-vial-SAL-SAL-SAL-SAL-s-SAL-Sal-SAL-SAL-Sal-Sal-Sal-S