In this paper, we introduce UnFuSeD, a novel approach to leverage self-supervised learning and reduce the need for large amounts of labeled data for audio classification. Unlike prior works, which directly fine-tune a self-supervised pre-trained encoder on a target dataset, we use the encoder to generate pseudo-labels for unsupervised fine-tuning before the actual fine-tuning step. We first train an encoder using a novel self-supervised learning algorithm (SSL) on an unlabeled audio dataset. Then, we use that encoder to generate pseudo-labels on our target task dataset via clustering the extracted representations. These pseudo-labels are then used to guide self-distillation on a randomly initialized model, which we call unsupervised fine-tuning. Finally, the resultant encoder is then fine-tuned on our target task dataset. Through UnFuSeD, we propose the first system that moves away from generic SSL paradigms in literature, which pre-train and fine-tune the same encoder, and present a novel self-distillation-based system to leverage SSL pre-training for low-resource audio classification. In practice, UnFuSeD achieves state-of-the-art results on the LAPE Benchmark, significantly outperforming all our baselines. Additionally, UnFuSeD allows us to achieve this at a 40% reduction in the number of parameters over the previous state-of-the-art system. We make all our codes publicly available.
翻译:在本文中, 我们引入 UnFuSeD, 这是一种利用自我监督学习和减少大量标签数据对音频分类的需求的新办法。 与先前的工作不同, 前者直接微调目标数据集上一个自监督的事先训练的编码器, 我们使用编码器生成假标签, 在实际微调步骤之前进行未经监督的微调微调。 我们首先在未贴标签的音频数据集上使用新颖的自监督学习算法( SSL) 来训练编码器。 然后, 我们用该编码器在目标任务数据集上生成假标签。 与先前的工作不同, 前者直接微调目标数据集在目标数据集数据集上直接微调一个自监管器, 我们称之为未经监督的微调微调。 最后, 由此生成的编码器将调整我们的任务数据集。 通过 UNFOUSED, 我们建议第一个系统从通用的 SLSL 模式转换出40 样板, 该系统前和微调SLF 的系统正在显著地将SLDRE- 降低常规的SL 。</s>