This work explores class-incremental learning (CIL) for sound event detection (SED), advancing adaptability towards real-world scenarios. CIL's success in domains like computer vision inspired our SED-tailored method, addressing the unique challenges of diverse and complex audio environments. Our approach employs an independent unsupervised learning framework with a distillation loss function to integrate new sound classes while preserving the SED model consistency across incremental tasks. We further enhance this framework with a sample selection strategy for unlabeled data and a balanced exemplar update mechanism, ensuring varied and illustrative sound representations. Evaluating various continual learning methods on the DCASE 2023 Task 4 dataset, we find that our research offers insights into each method's applicability for real-world SED systems that can have newly added sound classes. The findings also delineate future directions of CIL in dynamic audio settings.
翻译:暂无翻译