Spiking Neural Networks (SNNs) offer energy efficient processing suitable for edge applications, but conventional sensor data must first be converted into spike trains for neuromorphic processing. Environmental sound, including urban soundscapes, poses challenges due to variable frequencies, background noise, and overlapping acoustic events, while most spike based audio encoding research has focused on speech. This paper analyzes three spike encoding methods, Threshold Adaptive Encoding (TAE), Step Forward (SF), and Moving Window (MW) across three datasets: ESC10, UrbanSound8K, and TAU Urban Acoustic Scenes. Our multiband analysis shows that TAE consistently outperforms SF and MW in reconstruction quality, both per frequency band and per class across datasets. Moreover, TAE yields the lowest spike firing rates, indicating superior energy efficiency. For downstream environmental sound classification with a standard SNN, TAE also achieves the best performance among the compared encoders. Overall, this work provides foundational insights and a comparative benchmark to guide the selection of spike encoders for neuromorphic environmental sound processing.
翻译:脉冲神经网络(SNNs)具备适用于边缘应用的高能效处理能力,但传统传感器数据需先转换为脉冲序列以进行神经形态处理。环境声音(包括城市声景)因频率多变、背景噪声及声学事件重叠而面临挑战,而现有基于脉冲的音频编码研究多集中于语音。本文分析了三种脉冲编码方法——阈值自适应编码(TAE)、步进前向编码(SF)和移动窗口编码(MW)——在三个数据集(ESC10、UrbanSound8K和TAU Urban Acoustic Scenes)上的表现。我们的多频带分析表明,TAE在重构质量上始终优于SF和MW,这一优势体现在各数据集的频带级别和类别级别。此外,TAE产生的脉冲发放率最低,表明其具有更优的能效。在使用标准SNN进行下游环境声音分类任务时,TAE在对比的编码器中同样取得了最佳性能。总体而言,本研究为神经形态环境声音处理中脉冲编码器的选择提供了基础性见解和比较性基准。