Event cameras offer unparalleled advantages for real-time perception in dynamic environments, thanks to the microsecond-level temporal resolution and asynchronous operation. Existing event detectors, however, are limited by fixed-frequency paradigms and fail to fully exploit the high-temporal resolution and adaptability of event data. To address these limitations, we propose FlexEvent, a novel framework that enables detection at varying frequencies. Our approach consists of two key components: FlexFuse, an adaptive event-frame fusion module that integrates high-frequency event data with rich semantic information from RGB frames, and FlexTune, a frequency-adaptive fine-tuning mechanism that generates frequency-adjusted labels to enhance model generalization across varying operational frequencies. This combination allows our method to detect objects with high accuracy in both fast-moving and static scenarios, while adapting to dynamic environments. Extensive experiments on large-scale event camera datasets demonstrate that our approach surpasses state-of-the-art methods, achieving significant improvements in both standard and high-frequency settings. Notably, our method maintains robust performance when scaling from 20 Hz to 90 Hz and delivers accurate detection up to 180 Hz, proving its effectiveness in extreme conditions. Our framework sets a new benchmark for event-based object detection and paves the way for more adaptable, real-time vision systems.
翻译:事件相机凭借微秒级的时间分辨率和异步操作特性,在动态环境中为实时感知提供了无与伦比的优势。然而,现有的事件检测器受限于固定频率范式,未能充分利用事件数据的高时间分辨率与适应性。为解决这些局限性,我们提出了FlexEvent,一种支持可变频率检测的新型框架。该方法包含两个关键组件:FlexFuse——一个自适应的事件帧融合模块,将高频事件数据与RGB帧的丰富语义信息相融合;以及FlexTune——一种频率自适应的微调机制,通过生成频率调整标签来增强模型在不同操作频率下的泛化能力。这种组合使得我们的方法能够在高速运动和静态场景中均实现高精度目标检测,同时适应动态环境。在大规模事件相机数据集上的大量实验表明,我们的方法超越了现有最先进技术,在标准和高频设置下均取得了显著提升。值得注意的是,当频率从20 Hz扩展到90 Hz时,我们的方法仍保持鲁棒性能,并在高达180 Hz的频率下实现精确检测,证明了其在极端条件下的有效性。本框架为基于事件的目标检测设立了新基准,并为开发更具适应性、实时的视觉系统铺平了道路。