Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene classification (DCASE) community. In this study, we investigate the relationship between over-parameterization of acoustic scene classification models, and their resulting generalization abilities. Specifically, we test scaling CNNs in width and depth, under different conditions. Our results indicate that increasing width improves generalization to unseen devices, even without an increase in the number of parameters.
翻译:革命神经网络(CNNs)在机器视觉、机器监听和自然语言处理等不同领域主导了分类任务。在机器监听中,CNN对所使用的具体录音装置十分敏感,这在声学场景分类(DCASE)中被认为是一个重大问题。在本研究中,我们调查声学场景分类模型的超参数化及其所产生的一般化能力之间的关系。具体地说,我们在不同条件下在宽度和深度上测试CNN。我们的结果表明,即使没有增加参数,宽度的提高也提高了对无形装置的普及性。