Convolutional neural network inference on video input is computationally expensive and has high memory bandwidth requirements. Recently, researchers managed to reduce the cost of processing upcoming frames by only processing pixels that changed significantly. Using sparse convolutions, the sparsity of frame differences can be translated to speedups on current inference devices. However, previous work was relying on static cameras. Moving cameras add new challenges in how to fuse newly unveiled image regions with already processed regions efficiently to minimize the update rate - without increasing memory overhead and without knowing the camera extrinsics of future frames. In this work, we propose MotionDeltaCNN, a CNN framework that supports moving cameras and variable resolution input. We propose a spherical buffer which enables seamless fusion of newly unveiled regions and previously processed regions - without increasing the memory footprint. Our evaluations show that we outperform previous work significantly by explicitly adding support for moving camera input.
翻译:视频输入的进化神经网络推论计算成本昂贵,并具有很高的记忆带宽要求。 最近,研究人员设法通过只处理发生重大变化的像素来降低处理即将到来的框架的成本。 使用稀疏的变速器, 框架差异的宽度可以转化为当前推断装置的加速度。 但是, 先前的工作依赖于静态相机。 移动相机增加了新的挑战, 如何将新公开的图像区域与已经加工的区域连接起来, 以有效最大限度地降低更新率, 而不增加记忆管理, 也不了解相机未来框架的扩展功能 。 在这项工作中, 我们提出了移动 DeltaCNN 运动, 这是一个CNN 支持移动相机和可变分辨率输入的CNN 框架 。 我们提议了一个球质缓冲器, 使新揭开的区域和以前处理的区域能够无缝的融合, 而不增加记忆足迹。 我们的评估显示, 我们通过明确增加对移动相机输入的支持, 大大超前的工作。