PSLA:改进预先、抽样、标签和聚合的音频事件分类 (PSLA: Improving Audio Event Classification with Pretraining, Sampling, Labeling, and Aggregation)

Audio event classification is an active research area and has a wide range of applications. Since the release of AudioSet, great progress has been made in advancing the classification accuracy, which mostly comes from the development of novel model architectures and attention modules. However, we find that appropriate training techniques are equally important for building audio event classification models with AudioSet, but have not received the attention they deserve. To fill the gap, in this work, we present PSLA, a collection of training techniques that can noticeably boost the model accuracy including ImageNet pretraining, balanced sampling, data augmentation, label enhancement, model aggregation and their design choices. By training an EfficientNet with these techniques, we obtain a model that achieves a new state-of-the-art mean average precision (mAP) of 0.474 on AudioSet, outperforming the previous best system of 0.439.

翻译：音频事件分类是一个积极的研究领域,应用范围很广。自《AudioSet》发布以来,在提高分类准确性方面取得了很大进展,这主要来自开发新型模型结构和关注模块。然而,我们发现,适当的培训技术对于用《AudioSet》建立音频事件分类模型同样重要,但没有得到应有的重视。为了填补这一空白,我们向PSLA展示了一套培训技术,这些技术可以明显提高模型准确性,包括图像网络预培训、均衡抽样、数据增强、标签强化、模型汇总及其设计选择。我们通过以这些技术培训一个高效的网络,获得了一种模型,在音频Set上实现一个新的最先进的平均平均精确度(0.474 MAP),超过了以前的0.439最佳系统。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/