PSLA: 改进预培训、抽样、标签和聚合方面的音频拖网 (PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation)

Audio tagging is an active research area and has a wide range of applications. Since the release of AudioSet, great progress has been made in advancing model performance, which mostly comes from the development of novel model architectures and attention modules. However, we find that appropriate training techniques are equally important for building audio tagging models with AudioSet, but have not received the attention they deserve. To fill the gap, in this work, we present PSLA, a collection of training techniques that can noticeably boost the model accuracy including ImageNet pretraining, balanced sampling, data augmentation, label enhancement, model aggregation and their design choices. By training an EfficientNet with these techniques, we obtain a single model (with 13.6M parameters) and an ensemble model that achieve mean average precision (mAP) scores of 0.444 and 0.474 on AudioSet, respectively, outperforming the previous best system of 0.439 with 81M parameters. In addition, our model also achieves a new state-of-the-art mAP of 0.567 on FSD50K.

翻译：音频标记是一个活跃的研究领域,应用范围很广。自《AudioSet》发布以来,在推进模型性能方面取得了很大进展,这主要来自开发新型模型结构和关注模块。然而,我们发现,适当的培训技术对于用《AudioSet》建立音频标记模型同样重要,但没有得到应有的重视。为了填补这一空白,我们向PSLA展示了一批培训技术,这些技术可以明显提高模型准确性,包括图像网络预培训、均衡抽样、数据增强、标签强化、模型汇总及其设计选择。通过以这些技术培训高效网络,我们获得了一个单一模型(13.6M参数)和一个组合模型,在音频Set上达到平均平均精确分数0.444和0.474,这分别超过了以往具有81M参数的0.439最佳系统。此外,我们的模型还在FSD50K上实现了一个新的最先进的MAP,即0.567。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

神经机器翻译的域自适应综述论文，64页pdf

专知会员服务

17+阅读 · 2021年4月16日

Python图像处理，366页pdf，Image Operators Image Processing in Python

专知会员服务

78+阅读 · 2020年7月23日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【医学图像处理中的因果性】52页ppt，Causality Matters in Medical Imaging

专知会员服务

60+阅读 · 2020年3月14日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【ICLR2020 预训练的百科全书】弱监督的知识-预训练的语言模型（PRETRAINED ENCYCLOPEDIA: WEAKLY SUPERVISED KNOWLEDGE-PRETRAINED LANGUAGE MODEL）

专知会员服务

25+阅读 · 2019年12月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【论文笔记】通俗理解少样本文本分类 (Few-Shot Text Classification) (1)

深度学习自然语言处理

7+阅读 · 2020年4月8日

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

(TensorFlow)实时语义分割比较研究

机器学习研究会

9+阅读 · 2018年3月12日

【论文推荐】最新七篇目标检测相关论文—Self Paced、上下文注意力、特征反射、层次特征、Tiny SSD、少样本、协同学习

专知

6+阅读 · 2018年2月25日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Robust Training of Social Media Image Classification Models for Rapid Disaster Response

Arxiv

0+阅读 · 2021年7月19日

Encoders and Ensembles for Task-Free Continual Learning

Arxiv

0+阅读 · 2021年7月19日

A Streaming End-to-End Framework For Spoken Language Understanding

Arxiv

0+阅读 · 2021年7月17日

$GI-NNet \& RGI-NNet: Development of Robotic Grasp Pose Models, Trainable with Large as well as Limited Labelled Training Datasets, under supervised and semi supervised paradigms$

GI-NNet \& RGI-NNet: Development of Robotic Grasp Pose Models, Trainable with Large as well as Limited Labelled Training Datasets, under supervised and semi supervised paradigms

Arxiv

0+阅读 · 2021年7月15日

General Instance Distillation for Object Detection

Arxiv

9+阅读 · 2021年3月3日

Big Self-Supervised Models are Strong Semi-Supervised Learners

Arxiv

6+阅读 · 2020年10月26日

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

Arxiv

17+阅读 · 2020年6月2日