Scale-permuted networks have shown promising results on object bounding box detection and instance segmentation. Scale permutation and cross-scale fusion of features enable the network to capture multi-scale semantics while preserving spatial resolution. In this work, we evaluate this meta-architecture design on semantic segmentation - another vision task that benefits from high spatial resolution and multi-scale feature fusion at different network stages. By further leveraging dilated convolution operations, we propose SpineNet-Seg, a network discovered by NAS that is searched from the DeepLabv3 system. SpineNet-Seg is designed with a better scale-permuted network topology with customized dilation ratios per block on a semantic segmentation task. SpineNet-Seg models outperform the DeepLabv3/v3+ baselines at all model scales on multiple popular benchmarks in speed and accuracy. In particular, our SpineNet-S143+ model achieves the new state-of-the-art on the popular Cityscapes benchmark at 83.04% mIoU and attained strong performance on the PASCAL VOC2012 benchmark at 85.56% mIoU. SpineNet-Seg models also show promising results on a challenging Street View segmentation dataset. Code and checkpoints will be open-sourced.
翻译:比例化网络在物体绑定框检测和实例分割上显示了有希望的结果。 比例化和跨尺度融合功能使网络能够在保存空间分辨率的同时捕捉多尺度的语义。 在这项工作中, 我们评估了语义分割的元结构设计―― 在不同网络阶段受益于高空间分辨率和多尺度特征融合的另一种愿景任务。 我们进一步利用变异式操作, 提议SpineNet- Seg, 这是NAS从 DeepLabv3 系统中发现的网络, 从 DeepLabv3 系统中搜索的。 SpineNet-Seg 设计了一个更好的比例化网络表层, 在一个语义分割任务中每个街区都有定制的比喻。 SpineNet-Seg 模型在速度和准确的多个广受欢迎的基准中超越了DeepLabv3/v3+所有模型的DeepLabv3/v3+基线。 我们的SpineNet- S143+模型在83.04% MIU U 流行城市基准中找到了新的状态, 并且实现了在 PASAL- Silvealal imal IMel imal Stabil IMAxal IM56 数据库中将显示一个具有挑战性的模型。