Audio sound recognition and classification is used for many tasks and applications including human voice recognition, music recognition and audio tagging. In this paper we apply Mel Frequency Cepstral Coefficients (MFCC) in combination with a range of machine learning models to identify (Australian) birds from publicly available audio files of their birdsong. We present approaches used for data processing and augmentation and compare the results of various state of the art machine learning models. We achieve an overall accuracy of 91% for the top-5 birds from the 30 selected as the case study. Applying the models to more challenging and diverse audio files comprising 152 bird species, we achieve an accuracy of 58%
翻译:听声识别和分类用于许多任务和应用程序,包括人类语音识别、音乐识别和录音标签等。在本文中,我们运用了梅尔频环斯特拉系数(MFCC)以及一系列机器学习模型,从公开可得的鸟儿音频档案中识别(澳大利亚)鸟类。我们介绍了数据处理和扩增方法,并比较了各种先进机器学习模型的结果。我们从案例研究中挑选的30只头5只鸟类中取得了91%的总体准确率。我们将这些模型应用到由152种鸟类组成的更具挑战性和多样性的音频文档中,我们实现了58%的准确率。