This paper presents a context-aware framework for feature selection and classification procedures to realize a fast and accurate audio event annotation and classification. The context-aware design starts with exploring feature extraction techniques to find an appropriate combination to select a set resulting in remarkable classification accuracy with minimal computational effort. The exploration for feature selection also embraces an investigation of audio Tempo representation, an advantageous feature extraction method missed by previous works in the environmental audio classification research scope. The proposed annotation method considers outlier, inlier, and hard-to-predict data samples to realize context-aware Active Learning, leading to the average accuracy of 90% when only 15% of data possess initial annotation. Our proposed algorithm for sound classification obtained average prediction accuracy of 98.05% on the UrbanSound8K dataset. The notebooks containing our source codes and implementation results are available at https://github.com/gitmehrdad/FACE.
翻译:本文介绍了为快速和准确的音频事件说明和分类而采用特征选择和分类程序的背景意识框架。背景意识设计始于探索特征提取技术,以找到合适的组合来选择一套能够以最小的计算努力达到显著的分类精确度的数据集。对特征选择的探索还包括对音频Tempo表示的调查,这是以前在环境音频分类研究范围内研究工作中遗漏的一种有利的特征提取方法。拟议的注释方法考虑了实现背景认知积极学习的外部、内在和难以预测的数据样本,导致在只有15%的数据拥有初始注解时达到90%的平均准确性。我们拟议的合理分类算法在城市Sound8K数据集中获得了98.05%的平均预测准确性。载有我们源代码和执行结果的笔记本可在https://github.com/gitmehrdad/FACE查阅。</s>