Most successful computer vision models transform low-level features, such as Gabor filter responses, into richer representations of intermediate or mid-level complexity for downstream visual tasks. These mid-level representations have not been explored for event cameras, although it is especially relevant to the visually sparse and often disjoint spatial information in the event stream. By making use of locally consistent intermediate representations, termed as superevents, numerous visual tasks ranging from semantic segmentation, visual tracking, depth estimation shall benefit. In essence, superevents are perceptually consistent local units that delineate parts of an object in a scene. Inspired by recent deep learning architectures, we present a novel method that employs lifetime augmentation for obtaining an event stream representation that is fed to a fully convolutional network to extract superevents. Our qualitative and quantitative experimental results on several sequences of a benchmark dataset highlights the significant potential for event-based downstream applications.
翻译:最成功的计算机视觉模型将低层次的特征(如加博过滤器反应)转化为更富的中等或中级复杂特征,用于下游视觉任务。这些中等层次的演示没有为活动摄像机进行探索,尽管它与事件流中的视觉稀少和往往不相干的空间信息特别相关。通过利用当地一致的中间演示(称为超级活动),许多视觉任务(包括语义分割、视觉跟踪、深度估计等)将是有益的。实质上,超大型活动是概念上一致的地方单位,在一段场景中标出一个物体的部分。根据最近的深层次学习结构,我们提出了一种新颖的方法,利用寿命增加来获取一个事件流的演示,并提供给一个完全革命性的网络,以提取超级事件。我们在一系列基准数据集上的定性和定量实验结果突出了基于事件的下游应用的巨大潜力。