Spiking neural networks (SNNs), known for their low-power, event-driven computation and intrinsic temporal dynamics, are emerging as promising solutions for processing dynamic, asynchronous signals from event-based sensors. Despite their potential, SNNs face challenges in training and architectural design, resulting in limited performance in challenging event-based dense prediction tasks compared to artificial neural networks (ANNs). In this work, we develop an efficient spiking encoder-decoder network (SpikingEDN) for large-scale event-based semantic segmentation tasks. To enhance the learning efficiency from dynamic event streams, we harness the adaptive threshold which improves network accuracy, sparsity and robustness in streaming inference. Moreover, we develop a dual-path Spiking Spatially-Adaptive Modulation module, which is specifically tailored to enhance the representation of sparse events and multi-modal inputs, thereby considerably improving network performance. Our SpikingEDN attains a mean intersection over union (MIoU) of 72.57\% on the DDD17 dataset and 58.32\% on the larger DSEC-Semantic dataset, showing competitive results to the state-of-the-art ANNs while requiring substantially fewer computational resources. Our results shed light on the untapped potential of SNNs in event-based vision applications. The source code will be made publicly available.
翻译:暂无翻译