Several methods have been developed to assess the perceptual quality of audio under transforms like lossy compression. However, they require paired reference signals of the unaltered content, limiting their use in applications where references are unavailable. This has hindered progress in audio generation and style transfer, where a no-reference quality assessment method would allow more reproducible comparisons across methods. We propose training a GAN on a large music library, and using its discriminator as a no-reference quality assessment measure of the perceived quality of music. This method is unsupervised, needs no access to degraded material and can be tuned for various domains of music. In a listening test with 448 human subjects, where participants rated professionally produced music tracks degraded with different levels and types of signal degradations such as waveshaping distortion and low-pass filtering, we establish a dataset of human rated material. By using the human rated dataset we show that the discriminator score correlates significantly with the subjective ratings, suggesting that the proposed method can be used to create a no-reference musical audio quality assessment measure.
翻译:已经开发了几种方法来评估变压如减压等变压下的音频的感知质量。 但是,它们需要非变换内容的配对参考信号,限制其在无法引用的应用程序中的使用。这阻碍了音频生成和风格传输的进展,因为无色质量评估方法可以使不同方法之间有更多的可复制的比较。 我们提议在大型音乐库中培训GAN, 并使用其偏差器作为音乐感知质量的无色质量评估尺度。 这种方法不受监督, 不需要获得退化的材料, 并且可以调控音乐的各个领域。 在448个人类主题的听觉测试中, 参与者将专业制作的音乐音轨评为退化程度和信号退化类型不同, 如波浪变和低通道过滤等, 我们建立了人类评级材料的数据集。 我们通过使用人类评级数据集来显示, 歧视者的评分与主观评分有显著的关联, 这表明, 提议的方法可以用来创建一种无色音频质量评估尺度。