Recent literature has demonstrated that the use of per-channel energy normalization (PCEN), has significant performance improvements over traditional log-scaled mel-frequency spectrograms in acoustic sound event detection (SED) in a multi-class setting with overlapping events. However, the configuration of PCEN's parameters is sensitive to the recording environment, the characteristics of the class of events of interest, and the presence of multiple overlapping events. This leads to improvements on a class-by-class basis, but poor cross-class performance. In this article, we experiment using PCEN spectrograms as an alternative method for SED in urban audio using the UrbanSED dataset, demonstrating per-class improvements based on parameter configuration. Furthermore, we address cross-class performance with PCEN using a novel method, Multi-Rate PCEN (MRPCEN). We demonstrate cross-class SED performance with MRPCEN, demonstrating improvements to cross-class performance compared to traditional single-rate PCEN.
翻译:最近的文献表明,使用每个频道的能源正常化(PCEN),在声频事件探测(SED)的多级环境中,相对于传统的日标流频谱谱(SED)而言,在声频事件探测(SED)中,与传统的日志流频频谱(SED)相比,其性能有显著改善;然而,PCEN参数的配置对记录环境、感兴趣事件类别的特点以及多重重叠事件的存在都十分敏感。这导致逐级改进,但跨级性能不佳。在本篇文章中,我们用CEN光谱作为城市音频中SED的替代方法,使用城市SED数据集进行试验,展示基于参数配置的每类改进;此外,我们使用新颖的方法(多Rate PCENCEN(MRPCEN))与PCEN处理跨级性能。我们向MRPCEN展示了跨级 SEDD的性能,表明与传统的单级PECEN相比跨级性能的改进。