Video data is often repetitive; for example, the contents of adjacent frames are usually strongly correlated. Such redundancy occurs at multiple levels of complexity, from low-level pixel values to textures and high-level semantics. We propose Event Neural Networks (EvNets), which leverage this redundancy to achieve considerable computation savings during video inference. A defining characteristic of EvNets is that each neuron has state variables that provide it with long-term memory, which allows low-cost, high-accuracy inference even in the presence of significant camera motion. We show that it is possible to transform a wide range of neural networks into EvNets without re-training. We demonstrate our method on state-of-the-art architectures for both high- and low-level visual processing, including pose recognition, object detection, optical flow, and image enhancement. We observe roughly an order-of-magnitude reduction in computational costs compared to conventional networks, with minimal reductions in model accuracy.
翻译:视频数据往往是重复的;例如,相邻框架的内容通常是密切相关的。这种冗余发生于从低等像素值到质素和高层次语义学的多重复杂程度,从低等像素值到质素和高层次语义学。我们提议“天体神经网络 ” ( EvNets), 利用这种冗余来在视频推断过程中实现大量计算节余。 EvNets的一个特征是,每个神经元都有提供其长期内存的变量, 即使在有重大相机动作的情况下, 也允许低成本、 高准确性推导。 我们表明, 将一系列广泛的神经网络不经再培训而转换为 EvNets 。 我们展示了我们高、低级视觉处理的最先进的结构, 包括表面识别、 对象探测、 光流 和 图像增强。 我们观察了与常规网络相比计算成本的测算顺序降低, 且模型精确性降低到的幅度极小。