Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each sub-region within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection. On COCO test-dev, our SipMask outperforms the existing single-stage methods. Compared to the state-of-the-art single-stage TensorMask, SipMask obtains an absolute gain of 1.0% (mask AP), while providing a four-fold speedup. In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3.0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp. We also evaluate our SipMask for real-time video instance segmentation, achieving promising results on YouTube-VIS dataset. The source code is available at https://github.com/JialeCao001/SipMask.
翻译:由于速度和简便,单阶段断裂方法最近越来越受欢迎,但与两阶段方法相比,其准确性仍然落后。我们建议采用快速的单阶段分解方法,称为SipMask,通过将对一个实例的蒙面预测分离到检测到不同分区的捆绑框。我们的主要贡献是一个新的轻量空间保存模块,该模块在捆绑框内为每个分区生成一套单独的空间系数,从而改进了遮罩预测。它还使得空间相邻实例得到准确的划界。此外,我们采用了一种快速的单阶段分解方法,称为SipMask,该方法通过将特定实例的空间信息与物体探测更好地联系起来。关于CO测试-dev,我们的SipMask比现有的单阶段方法。与最有希望的单一阶段Lisormask相比,SipMask获得了绝对的收益1.0%(mask AP),同时提供了四倍的速度。从实时能力的角度来说,SipMask的校正值损失和在SALSAVS的绝对速度下运行一个SALA。