Convolutional neural network inference on video input is computationally expensive and has high memory bandwidth requirements. Recently, researchers managed to reduce the cost of processing upcoming frames by only processing pixels that changed significantly. Using sparse convolutions, the sparsity of frame differences can be translated to speedups on current inference devices. However, previous work was relying on static cameras. Moving cameras add new challenges in how to fuse newly unveiled image regions with already processed regions efficiently to minimize the update rate - without increasing memory overhead and without knowing the camera extrinsics of future frames. In this work, we propose MotionDeltaCNN, a CNN framework that supports moving cameras and variable resolution input. We propose a spherical buffer which enables seamless fusion of newly unveiled regions and previously processed regions - without increasing the memory footprint. Our evaluations show that we outperform previous work by up to 90% by explicitly adding support for moving camera input.
翻译:视频输入的进化神经网络的推论计算成本高昂,并具有很高的记忆带宽要求。 最近,研究人员通过只处理发生重大变化的像素来降低处理即将到来的框架的成本。 使用稀疏的卷变, 框架差异的宽度可以转换为当前推断装置的加速度。 但是, 先前的工作依赖于静态相机。 移动相机增加了新的挑战, 如何将新公开的图像区域与已经处理过的区域连接起来, 以有效最大限度地降低更新率 — — 而不增加记忆管理, 也不了解未来框架的相机的极限。 在这项工作中, 我们提出了移动 DeltaCNN 运动, 这是一种支持移动相机和可变分辨率输入的CNN CN 框架 。 我们提出了一种球质缓冲, 使新揭开的区域和先前处理的区域能够无缝的融合 — 而不增加记忆足迹。 我们的评估显示, 我们通过明确增加对移动相机输入的支持, 超越了90% 。