With increasing amounts of music being digitally transferred from production to distribution, automatic means of determining media quality are needed. Protection mechanisms in digital audio processing tools have not eliminated the need of production entities located downstream the distribution chain to assess audio quality and detect defects inserted further upstream. Such analysis often relies on the received audio and scarce meta-data alone. Deliberate use of artefacts such as clicks in popular music as well as more recent defects stemming from corruption in modern audio encodings call for data-centric and context sensitive solutions for detection. We present a convolutional network architecture following end-to-end encoder decoder configuration to develop detectors for two exemplary audio defects. A click detector is trained and compared to a traditional signal processing method, with a discussion on context sensitivity. Additional post-processing is used for data augmentation and workflow simulation. The ability of our models to capture variance is explored in a detector for artefacts from decompression of corrupted MP3 compressed audio. For both tasks we describe the synthetic generation of artefacts for controlled detector training and evaluation. We evaluate our detectors on the large open-source Free Music Archive (FMA) and genre-specific datasets.
翻译:随着越来越多的音乐从生产到分销的数字化转移,需要自动手段确定媒体质量。数字音频处理工具的保护机制并没有消除位于分销链下游的生产实体评估音质和进一步发现缺陷的需要。这种分析往往仅仅依靠收到的音频和稀缺元数据。故意使用诸如大众音乐点击以及现代音频编码腐败造成的最新缺陷,需要以数据为中心并针对背景敏感的解决办法加以探测。我们介绍了终端到终端编码解码器配置后的一个脉冲网络结构,以开发两个模范音质缺陷的探测器。对点击探测器进行了培训,并将其与传统的信号处理方法进行比较,并讨论了背景敏感性。额外后处理用于数据增强和工作流程模拟。我们模型捕捉差异的能力在从压坏坏的MP3压缩音频中加以探索。关于我们描述用于控制检测器培训和评估的合成工艺品生成工作,我们评估了大型开放源自由音乐档案(FMA)和基因专用数据集的探测器。