EvConv:快速CNN关于高频机器人感知活动摄像头输入的推论</s> (EvConv: Fast CNN Inference on Event Camera Inputs For High-Speed Robot Perception)

Event cameras capture visual information with a high temporal resolution and a wide dynamic range. This enables capturing visual information at fine time granularities (e.g., microseconds) in rapidly changing environments. This makes event cameras highly useful for high-speed robotics tasks involving rapid motion, such as high-speed perception, object tracking, and control. However, convolutional neural network inference on event camera streams cannot currently perform real-time inference at the high speeds at which event cameras operate - current CNN inference times are typically closer in order of magnitude to the frame rates of regular frame-based cameras. Real-time inference at event camera rates is necessary to fully leverage the high frequency and high temporal resolution that event cameras offer. This paper presents EvConv, a new approach to enable fast inference on CNNs for inputs from event cameras. We observe that consecutive inputs to the CNN from an event camera have only small differences between them. Thus, we propose to perform inference on the difference between consecutive input tensors, or the increment. This enables a significant reduction in the number of floating-point operations required (and thus the inference latency) because increments are very sparse. We design EvConv to leverage the irregular sparsity in increments from event cameras and to retain the sparsity of these increments across all layers of the network. We demonstrate a reduction in the number of floating operations required in the forward pass by up to 98%. We also demonstrate a speedup of up to 1.6X for inference using CNNs for tasks such as depth estimation, object recognition, and optical flow estimation, with almost no loss in accuracy.

翻译：事件相机以高时间分辨率和广度动态范围捕获视觉信息。这样可以在快速变化的环境中以极短的时间颗粒度( 如微秒) 捕获视觉信息。这使得事件相机对高速机器人任务非常有用, 包括高速感知、对象跟踪和控制等快速动作。然而, 事件相机流的动态神经网络推论目前无法以事件相机运行的高速度实时推论它们之间的小差异。因此, 我们提议对连续输入变速或递增之间的差别进行推论, 其大小一般与常规框架相机的比重相近。在事件摄像头提供的高度频率和高时间分辨率中, 实时推断是完全利用事件摄像头提供的高频率和高时间分辨率的。本文展示了Ev Conv, 这是让CNN能够快速推导出事件摄像头投入的新方法。我们观察到事件相机对CNN的连续输入只有很小的不同之处。因此, 我们提议对连续输入变速变速时间或增速时间进行推论, 与常规基底摄像头摄像头的比值的比值的比值值值值比值。这可以大大降低到需要的浮测速度操作的数量, 。。在Sli值中, 我们的递递递增中, 递增的递增的递增中, 我们的递增中, 我们的递增到递增的递增级的递增的递增到递增中, 我们的递增的递增的递增到的递增量中, 的递增到的递增的递增中, 的递增量中, 。</s>