State-of-the-art systems for semantic image segmentation use feed-forward pipelines with fixed computational costs. Building an image segmentation system that works across a range of computational budgets is challenging and time-intensive as new architectures must be designed and trained for every computational setting. To address this problem we develop a recurrent neural network that successively improves prediction quality with each iteration. Importantly, the RNN may be deployed across a range of computational budgets by merely running the model for a variable number of iterations. We find that this architecture is uniquely suited for efficiently segmenting videos. By exploiting the segmentation of past frames, the RNN can perform video segmentation at similar quality but reduced computational cost compared to state-of-the-art image segmentation methods. When applied to static images in the PASCAL VOC 2012 and Cityscapes segmentation datasets, the RNN traces out a speed-accuracy curve that saturates near the performance of state-of-the-art segmentation methods.
翻译:为解决这一问题,我们开发了一个经常性神经网络,通过每次迭代不断提高预测质量。重要的是,RNN可能通过仅仅运行不同迭代数的模型,在一系列计算预算中部署在一系列计算预算中。我们发现,这一结构特别适合高效分离视频。通过利用以往框架的分离,RNN可以以类似质量进行视频分割,但与最新图像分离方法相比,计算成本将降低。当应用到PACAL VOC 2012 和 Cityscatures 分割数据集中的静态图像时,RNN可以追溯到一个与最新分解方法性能相近的快速精度曲线。