One hour before sunrise, one can experience the dawn chorus where birds from different species sing together. In this scenario, high levels of polyphony, as in the number of overlapping sound sources, are prone to happen resulting in a complex acoustic outcome. Sound Event Detection (SED) tasks analyze acoustic scenarios in order to identify the occurring events and their respective temporal information. However, highly dense scenarios can be hard to process and have not been studied in depth. Here we show, using a Convolutional Recurrent Neural Network (CRNN), how birdsong polyphonic scenarios can be detected when dealing with higher polyphony and how effectively this type of model can face a very dense scene with up to 10 overlapping birds. We found that models trained with denser examples (i.e., higher polyphony) learn at a similar rate as models that used simpler samples in their training set. Additionally, the model trained with the densest samples maintained a consistent score for all polyphonies, while the model trained with the least dense samples degraded as the polyphony increased. Our results demonstrate that highly dense acoustic scenarios can be dealt with using CRNNs. We expect that this study serves as a starting point for working on highly populated bird scenarios such as dawn chorus or other dense acoustic problems.
翻译:日出前一小时, 人们可以体验黎明合唱, 不同物种的鸟类在黎明合唱。 在这个场景中, 高水平的多方曲, 如重复的声音源数量, 很容易发生, 导致复杂的声学结果。 声音事件检测( SED) 任务分析声学情景, 以辨别发生的事件及其各自的时间信息。 然而, 高度密集的情景可能很难处理, 并且没有进行深度研究 。 这里, 我们用一个循环常态神经网络( CRNNN) 来显示, 如何在与更高多功能打交道时检测鸟群的多方曲情景, 以及这种类型的模型能够如何有效地面对一个非常稠密的场景, 多达10个重叠的鸟类。 我们发现, 受过较稠密实例( 即高多方曲) 训练的模型可以以类似的速度学习模型, 以便识别发生事件的事件及其各自的时间信息 。 此外, 由最稠密的样本所训练的模型对所有多光谱保持一致的评分数, 而经过训练的模型随着多功能的增加而退化的模型。 我们的结果表明, 高度稠密的声学情景可以用 CRCNNNNW 或高度的鸟类开始研究。