In this project, we present ShelfNet, a lightweight convolutional neural network for accurate real-time semantic segmentation. Different from the standard encoder-decoder structure, ShelfNet has multiple encoder-decoder branch pairs with skip connections at each spatial level, which looks like a shelf with multiple columns. The shelf-shaped structure provides multiple paths for information flow and improves segmentation accuracy. Inspired by the success of recurrent convolutional neural networks, we use modified residual blocks where two convolutional layers share weights. The shared-weight block enables efficient feature extraction and model size reduction. We tested ShelfNet with ResNet50 and ResNet101 as the backbone respectively: they achieved 59 FPS and 42 FPS respectively on a GTX 1080Ti GPU with a 512x512 input image. ShelfNet achieved high accuracy: on PASCAL VOC 2012 test set, it achieved 84.2% mIoU with ResNet101 backbone and 82.8% mIoU with ResNet50 backbone; it achieved 75.8% mIoU with ResNet50 backbone on Cityscapes dataset. ShelfNet achieved both higher mIoU and faster inference speed compared with state-of-the-art real-time semantic segmentation models. We provide the implementation https://github.com/juntang-zhuang/ShelfNet.
翻译:在这个项目中,我们展示了ShilmNet,这是一个用于准确实时语义分解的轻量级神经神经网络。与标准的编码器脱coder-decoder结构不同,ShilmNet拥有多个编码器-解码器分支配对,每个空间层的连接跳过,看起来像是一个多柱的架子。架状结构提供了信息流动的多条路径,并提高了分解的准确性。在循环神经神经网络的成功激励下,我们使用经修改的残余区块,其中两层相通层分享重量。共享重量块使得能够高效地段提取和缩小模型大小。我们用ResNet50和ResNet101作为主干线测试了ShilmNet1080-GPU分别实现了59个FPS和42个FPS,而GTX1080-GPU的输入图像是512x512。大陆架网络实现了高精度:在PASCAL VOC 2012测试集中,它实现了84.2% mIOU,在ResNet101骨架主干和82.8% mIOU骨干中,在ResNet50主干中实现了75.8%IU,在ResNet50的主干中实现了高点/Sherealserealseregregetrealserealserealse/ supretalsealseretalsealsealserealseseet提供。