We present a novel approach to automatically detect and classify great ape calls from continuous raw audio recordings collected during field research. Our method leverages deep pretrained and sequential neural networks, including wav2vec 2.0 and LSTM, and is validated on three data sets from three different great ape lineages (orangutans, chimpanzees, and bonobos). The recordings were collected by different researchers and include different annotation schemes, which our pipeline preprocesses and trains in a uniform fashion. Our results for call detection and classification attain high accuracy. Our method is aimed to be generalizable to other animal species, and more generally, sound event detection tasks. To foster future research, we make our pipeline and methods publicly available.
翻译:我们提出了一个新颖的方法,自动检测和分类从实地研究期间收集的连续原始录音中发出的巨猿电话。我们的方法利用了深层的预先培训和连续神经网络,包括 wav2vec 2.0 和 LSTM, 并用三个不同的大猿系(猩猩、黑猩猩和公益生物)的三组数据进行验证。这些记录是由不同研究人员收集的,包括不同的说明计划,我们编程前以统一的方式处理和训练。我们的电话探测和分类结果非常准确。我们的方法旨在推广到其他动物物种,更一般地说来,还包括健全的事件探测任务。为了促进未来的研究,我们把编程和方法公诸于众。