Video data is often repetitive; for example, the content of adjacent frames is usually strongly correlated. Such repetition occurs at multiple levels of complexity, from low-level pixel values to textures and high-level semantics. We propose Event Neural Networks (EvNets), a novel class of networks that leverage this repetition to achieve considerable computation savings for video inference tasks. A defining characteristic of EvNets is that each neuron has state variables that provide it with long-term memory, which allows low-cost inference even in the presence of significant camera motion. We show that it is possible to transform virtually any conventional neural into an EvNet. We demonstrate the effectiveness of our method on several state-of-the-art neural networks for both high- and low-level visual processing, including pose recognition, object detection, optical flow, and image enhancement. We observe up to an order-of-magnitude reduction in computational costs (2-20x) as compared to conventional networks, with minimal reductions in model accuracy.
翻译:视频数据往往是重复性的; 例如, 相邻框架的内容通常具有很强的关联性。 这种重复发生在从低等像素值到质素和高层次语义学的多重复杂程度。 我们提议了“ 事件神经网络”(EvNets),这是利用这种重复实现视频推断任务大量节省的新型网络类别。 EvNets的一个特征是,每个神经元都有提供其长期内存的变量,这使得即使在有重大相机动作的情况下也能够进行低成本推论。 我们表明,几乎可以将任何常规神经转换为“ EvNet ” 。 我们展示了我们用于高层次和低层次视觉处理的若干最先进的神经网络的方法的有效性,包括表面识别、物体探测、光学流和图像增强。 我们观察到,与常规网络相比,计算成本( 2-20x ) 降低到一个测高的顺序, 与常规网络相比, 模型精度降低到最低。