SegBlocks reduces the computational cost of existing neural networks, by dynamically adjusting the processing resolution of image regions based on their complexity. Our method splits an image into blocks and downsamples blocks of low complexity, reducing the number of operations and memory consumption. A lightweight policy network, selecting the complex regions, is trained using reinforcement learning. In addition, we introduce several modules implemented in CUDA to process images in blocks. Most important, our novel BlockPad module prevents the feature discontinuities at block borders of which existing methods suffer, while keeping memory consumption under control. Our experiments on Cityscapes, Camvid and Mapillary Vistas datasets for semantic segmentation show that dynamically processing images offers a better accuracy versus complexity trade-off compared to static baselines of similar complexity. For instance, our method reduces the number of floating-point operations of SwiftNet-RN18 by 60% and increases the inference speed by 50%, with only 0.3% decrease in mIoU accuracy on Cityscapes.
翻译:SegBlocks 降低现有神经网络的计算成本, 其方法是根据图像区域的复杂程度动态调整图像处理分辨率。 我们的方法将图像分割成小块块和小样块, 减少操作数量和内存消耗量。 轻量政策网络, 选择复杂区域, 接受强化学习培训。 此外, 我们引入了在 CUDA 中安装的多个模块来处理各块图像。 最重要的是, 我们的新型BlockPad 模块防止了现有方法所受影响区块边界的特征不连续性, 同时控制了内存的消耗。 我们在城市景、 Camvid 和 Mapillary Vistas 数据组的实验显示, 动态处理图像比类似复杂程度的静态基线的精确度和复杂性取舍。 例如, 我们的方法将SwiftNet- RN18 的浮点操作数量减少了60%, 将推断速度提高了50%, 而在城市景区点上的 mIU 精度仅降低了0.3% 。