In this paper we propose BlockCopy, a scheme that accelerates pretrained frame-based CNNs to process video more efficiently, compared to standard frame-by-frame processing. To this end, a lightweight policy network determines important regions in an image, and operations are applied on selected regions only, using custom block-sparse convolutions. Features of non-selected regions are simply copied from the preceding frame, reducing the number of computations and latency. The execution policy is trained using reinforcement learning in an online fashion without requiring ground truth annotations. Our universal framework is demonstrated on dense prediction tasks such as pedestrian detection, instance segmentation and semantic segmentation, using both state of the art (Center and Scale Predictor, MGAN, SwiftNet) and standard baseline networks (Mask-RCNN, DeepLabV3+). BlockCopy achieves significant FLOPS savings and inference speedup with minimal impact on accuracy.
翻译:在本文中,我们提出BlockCopy, 这是一项与标准框架逐条处理相比,加快预先训练的有框架的CNN系统处理视频效率更高的计划。 为此,轻量政策网络在图像中确定重要区域,并且只对选定区域适用操作,使用自定义的区块偏振动。非选定区域的特点只是从前一个框架复制,减少了计算和延缓的次数。执行政策的培训采用在线强化学习方式,而不需要地面真相说明。我们的普遍性框架在密集的预测任务上得到了证明,如行人探测、实例分割和语义分割,同时使用了艺术状态(Central和Scale Putureor、MGAN、SwiftNet)和标准基线网络(Mask-RCNN、DeepLabV3+ ) 。BlockCopy实现了巨大的FLOPS储蓄和推断速度,对准确性影响最小。