An adversary is essentially an algorithm intent on making a classification system perform in some particular way given an input, e.g., increase the probability of a false negative. Recent work builds adversaries for deep learning systems applied to image object recognition, which exploits the parameters of the system to find the minimal perturbation of the input image such that the network misclassifies it with high confidence. We adapt this approach to construct and deploy an adversary of deep learning systems applied to music content analysis. In our case, however, the input to the systems is magnitude spectral frames, which requires special care in order to produce valid input audio signals from network-derived perturbations. For two different train-test partitionings of two benchmark datasets, and two different deep architectures, we find that this adversary is very effective in defeating the resulting systems. We find the convolutional networks are more robust, however, compared with systems based on a majority vote over individually classified audio frames. Furthermore, we integrate the adversary into the training of new deep systems, but do not find that this improves their resilience against the same adversary.
翻译:对手本质上是一种使分类系统以某种特定方式发挥作用的算法意图,例如,增加虚假负值的可能性。最近的工作为用于图像对象识别的深层学习系统建立了对手,这些系统利用了系统参数来查找输入图像图像的最小扰动性,使网络以高度信心错误分类。我们调整了这个方法,以构建和部署用于音乐内容分析的深层学习系统的对手。然而,就我们的情况而言,对系统的投入是规模的光谱框架,这需要特别小心,以便产生来自网络的扰动的有效输入音频信号。对于两个基准数据集和两个不同的深层结构来说,我们发现这一敌人非常有效地击败了由此形成的系统。我们发现,同基于对个人分类音频框架的多数投票的系统相比,共进网络更加强大。此外,我们把对手纳入到新的深层系统的培训中去,但并不认为这能提高他们对付同一敌人的复原力。