Audio processing methods operating on a time-frequency representation of the signal can introduce unpleasant sounding artifacts known as musical noise. These artifacts are observed in the context of audio coding, speech enhancement, and source separation. The change in kurtosis of the power spectrum introduced during the processing was shown to correlate with the human perception of musical noise in the context of speech enhancement, leading to the proposal of measures based on it. These baseline measures are here shown to correlate with human perception only in a limited manner. As ground truth for the human perception, the results from two listening tests are considered: one involving audio coding and one involving source separation. Simple but effective perceptually motivated improvements are proposed and the resulting new measure is shown to clearly outperform the baselines in terms of correlation with the results of both listening tests. Moreover, with respect to the listening test on musical noise in audio coding, the exhibited correlation is nearly as good as the one exhibited by the Artifact-related Perceptual Score (APS), which was found to be the best objective measure for this task. The APS is however computationally very expensive. The proposed measure is easily computed, requiring only a fraction of the computational cost of the APS.
翻译:以信号的时频表示方式操作的音频处理方法,可以引入不愉快的听觉手工艺,称为音乐噪音;这些手工艺在音调编码、语音增强和源分离的背景下观察到;加工过程中引入的电源频谱质谱变化与人对语音噪音的感知发生关联,在语音增强的背景下,导致根据它提出措施建议;这些基线措施仅以有限的方式显示与人感知相关;作为人类感知的基本事实,两个听觉测试的结果都得到考虑:一个涉及音频编码,一个涉及源分离;提出简单但有效的有概念动机的改进,并显示由此产生的新措施明显超出基线,与两次听觉测试的结果相关;此外,关于音频编码中音乐噪音的听觉测试,所显示的关联性几乎与人工行为相关感知分数(APS)所展示的相近。发现这是这项任务的最佳客观计量。APS是计算非常昂贵的计算费用。拟议的措施是简单计算,只需要计算一个PS的分数。