In this paper, we propose a novel fully convolutional two-stream fusion network (FCTSFN) for interactive image segmentation. The proposed network includes two sub-networks: a two-stream late fusion network (TSLFN) that predicts the foreground at a reduced resolution, and a multi-scale refining network (MSRN) that refines the foreground at full resolution. The TSLFN includes two distinct deep streams followed by a fusion network. The intuition is that, since user interactions are more direction information on foreground/background than the image itself, the two-stream structure of the TSLFN reduces the number of layers between the pure user interaction features and the network output, allowing the user interactions to have a more direct impact on the segmentation result. The MSRN fuses the features from different layers of TSLFN with different scales, in order to seek the local to global information on the foreground to refine the segmentation result at full resolution. We conduct comprehensive experiments on four benchmark datasets. The results show that the proposed network achieves competitive performance compared to current state-of-the-art interactive image segmentation methods.
翻译:在本文中,我们提出一个新的全演双流融合网络(FATSFNF),用于互动图像分割。拟议网络包括两个子网络:一个双流后流融合网络(TSLFN),用于以降低分辨率预测前景,另一个多规模的精炼网络(MSRN),用于以完全分辨率改进前景。TSLFN包括两个截然不同的深层流,然后是一个融合网络。直觉是,由于用户互动比图像本身更能指导地表/背地的信息,TSLFF的双流结构减少了纯用户互动特征和网络输出之间的层层数,使用户互动能够对分解结果产生更直接的影响。MSRN将TSLFF的不同层的特征与不同尺度结合起来,以寻找关于地面的本地到全球信息,以完全分辨率改进分解结果。我们在四个基准数据集上进行了全面试验。结果显示,拟议的网络取得了与当前状态互动图像分割方法相比的竞争性性表现。