State-of-the-art machine-learning methods for event cameras treat events as dense representations and process them with conventional deep neural networks. Thus, they fail to maintain the sparsity and asynchronous nature of event data, thereby imposing significant computation and latency constraints on downstream systems. A recent line of work tackles this issue by modeling events as spatiotemporally evolving graphs that can be efficiently and asynchronously processed using graph neural networks. These works showed impressive computation reductions, yet their accuracy is still limited by the small scale and shallow depth of their network, both of which are required to reduce computation. In this work, we break this glass ceiling by introducing several architecture choices which allow us to scale the depth and complexity of such models while maintaining low computation. On object detection tasks, our smallest model shows up to 3.7 times lower computation, while outperforming state-of-the-art asynchronous methods by 7.4 mAP. Even when scaling to larger model sizes, we are 13% more efficient than state-of-the-art while outperforming it by 11.5 mAP. As a result, our method runs 3.7 times faster than a dense graph neural network, taking only 8.4 ms per forward pass. This opens the door to efficient, and accurate object detection in edge-case scenarios.
翻译:对事件摄像机来说,最先进的机学方法将事件视为密集的表示,并用传统的深神经网络处理。因此,它们无法保持事件数据的广度和不同步性质,从而给下游系统带来重大的计算和潜伏限制。最近的工作线通过模拟事件来解决这个问题,模拟事件为时尚变化的图案,这些图案使用图形神经网络可以高效和不同步地处理。这些工程显示令人印象深刻的计算减少,但其准确性仍然受到其网络规模小和深度浅小的限制,两者都需要减少计算。在这项工作中,我们通过引入若干结构选择来打破玻璃天花板,这些选择使我们能够在保持低计算的同时扩大这些模型的深度和复杂性。在物体探测任务上,我们最小的模型显示的计算速度比7.7倍低,同时超过7.4 mAP 的状态和不同步处理方法。即使缩小到更大的模型规模,我们的计算速度也比其网络的状态低13%,同时要超过11.5 mAP的状态。作为结果,我们打破了这个玻璃天花天花天花天花,这个方法比一个快速的图像速度要快3.7倍。