The use of machine learning (ML) based techniques has become increasingly popular in the field of bioacoustics over the last years. Fundamental requirements for the successful application of ML based techniques are curated, agreed upon, high-quality datasets and benchmark tasks to be learned on a given dataset. However, the field of bioacoustics so far lacks such public benchmarks which cover multiple tasks and species to measure the performance of ML techniques in a controlled and standardized way and that allows for benchmarking newly proposed techniques to existing ones. Here, we propose BEANS (the BEnchmark of ANimal Sounds), a collection of bioacoustics tasks and public datasets, specifically designed to measure the performance of machine learning algorithms in the field of bioacoustics. The benchmark proposed here consists of two common tasks in bioacoustics: classification and detection. It includes 12 datasets covering various species, including birds, land and marine mammals, anurans, and insects. In addition to the datasets, we also present the performance of a set of standard ML methods as the baseline for task performance. The benchmark and baseline code is made publicly available at \url{https://github.com/earthspecies/beans} in the hope of establishing a new standard dataset for ML-based bioacoustic research.
翻译:过去几年来,在生物学领域,以机器学习为基础的技术的使用越来越普遍; 成功地应用以生物学为基础的技术的基本要求得到制定、商定、高质量的数据集和在特定数据集中学习的基准任务; 然而,迄今为止,生物学领域缺乏涵盖多种任务和物种的公开基准,这些基准涵盖多种任务和物种,以便以有控制和标准化的方式测量以控制方式和标准化方式计量以生物学为基础的技术的性能,并允许将新提出的技术与现有技术基准挂钩; 这里,我们提议BEANS(Amimal Sounds的标志),一套生物学任务和公共数据集的收集,专门用来衡量生物学领域机器学习算法的性能。此处提议的基准包括生物学学的两个共同任务:分类和检测,其中包括12个涵盖各种物种的数据集,包括鸟类、陆地和海洋哺乳动物、安非兰和昆虫。 除了这些数据集之外,我们还提出一套标准ML方法的性能,作为任务业绩基准。 基准和基线代码公布在生物学领域建立新的BeearLs标准数据。