Although prototypical network (ProtoNet) has proved to be an effective method for few-shot sound event detection, two problems still exist. Firstly, the small-scaled support set is insufficient so that the class prototypes may not represent the class center accurately. Secondly, the feature extractor is task-agnostic (or class-agnostic): the feature extractor is trained with base-class data and directly applied to unseen-class data. To address these issues, we present a novel mutual learning framework with transductive learning, which aims at iteratively updating the class prototypes and feature extractor. More specifically, we propose to update class prototypes with transductive inference to make the class prototypes as close to the true class center as possible. To make the feature extractor to be task-specific, we propose to use the updated class prototypes to fine-tune the feature extractor. After that, a fine-tuned feature extractor further helps produce better class prototypes. Our method achieves the F-score of 38.4$\%$ on the DCASE 2021 Task 5 evaluation set, which won the first place in the few-shot bioacoustic event detection task of Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 Challenge.
翻译:尽管事实证明,原型网络(ProtoNet)是少数声音事件探测的有效方法,但有两个问题仍然存在。首先,小规模支持组不够,因此班级原型可能无法准确代表班级中心。第二,地物提取器是任务性(或类不可知性):地物提取器经过基础级数据培训,直接应用于隐形类数据。为了解决这些问题,我们提出了一个具有转基因学习的新颖相互学习框架,目的是迭接更新班级原型和地物提取器。更具体地说,我们提议更新班级原型,以转换性推断法使班级原型尽可能接近真正的班中心。为使地物提取器与任务性具体化,我们提议使用更新后的班级原型对地物提取器进行微调,然后,经过微调的地物提取器进一步帮助生成更好的级原型。我们的方法在DCASE 2021任务5评估集上取得了384 $ $ $ 。我们的方法在DCASE 2021 任务评估组上取得了F-scricretal Excial 和Salize 20Sali Excial 事件的首位。