We investigate a speech enhancement method based on the binaural coherence-to-diffuse power ratio (CDR), which preserves auditory spatial cues for maskers and a broadside target. Conventional CDR estimators typically rely on a mathematical coherence model of the desired signal and/or diffuse noise field in their formulation, which may influence their accuracy in natural environments. This work proposes a new robust and parameterized directional binaural CDR estimator. The estimator is calculated in the time-frequency domain and is based on a geometrical interpretation of the spatial coherence function between the binaural microphone signals. The binaural performance of the new CDR estimator is compared with three state-of-the-art CDR estimators in cocktail-party-like environments and has shown improvements in terms of several objective speech quality metrics such as PESQ and SRMR. We also discuss the benefits of the parameterizable CDR estimator for varying sound environments and briefly reflect on several informal subjective evaluations using a low-latency real-time framework.
翻译:我们调查了一种基于二进制一致性到阻断功率比(CDR)的语音增强方法,这种方法为掩码器和宽边目标保留了听觉空间提示。常规CDR估计器通常依赖一个预想信号和/或扩散噪音场的数学一致性模型,这可能会影响其在自然环境中的准确性。这项工作提出了一个新的稳健和参数化方向性双进制CDR估计器。估计器是在时频域内计算出来的,并且基于对双进式麦克风信号之间的空间一致性功能的几何学解释。新的CDR估计器的二进制性能与三个在像鸡尾酒党的环境中最先进的CDR估计器相比较,在诸如PESQ和SRMR等若干客观的语音质量指标方面显示出改进。我们还讨论了可参数化的CDR估计器对不同声音环境的好处,并简要思考了使用低时实时框架进行的若干非正式的主观评价。