定位注意的突出物体排位 (Salient Object Ranking with Position-Preserved Attention)

Instance segmentation can detect where the objects are in an image, but hard to understand the relationship between them. We pay attention to a typical relationship, relative saliency. A closely related task, salient object detection, predicts a binary map highlighting a visually salient region while hard to distinguish multiple objects. Directly combining two tasks by post-processing also leads to poor performance. There is a lack of research on relative saliency at present, limiting the practical applications such as content-aware image cropping, video summary, and image labeling. In this paper, we study the Salient Object Ranking (SOR) task, which manages to assign a ranking order of each detected object according to its visual saliency. We propose the first end-to-end framework of the SOR task and solve it in a multi-task learning fashion. The framework handles instance segmentation and salient object ranking simultaneously. In this framework, the SOR branch is independent and flexible to cooperate with different detection methods, so that easy to use as a plugin. We also introduce a Position-Preserved Attention (PPA) module tailored for the SOR branch. It consists of the position embedding stage and feature interaction stage. Considering the importance of position in saliency comparison, we preserve absolute coordinates of objects in ROI pooling operation and then fuse positional information with semantic features in the first stage. In the feature interaction stage, we apply the attention mechanism to obtain proposals' contextualized representations to predict their relative ranking orders. Extensive experiments have been conducted on the ASR dataset. Without bells and whistles, our proposed method outperforms the former state-of-the-art method significantly. The code will be released publicly available.

翻译：我们关注一个典型的关系, 相对显著性。一个密切相关的任务, 突出的物体探测, 预测一个二进制的地图, 突出显示一个显要区域, 而很难区分多个对象。通过后处理直接合并两个任务, 也导致性能不佳。目前缺乏对相对显著性的研究, 限制了内容认知图像裁剪、视频摘要和图像标签等实用应用。在本文中, 我们研究一个高亮对象排序( SOR) 任务, 能够根据每个被检测对象的视觉突出性能指定一个顺序。我们建议SOR 任务的第一个端对端框架, 并用多任务学习的方式解决它。框架同时处理实例分割和突出对象排序。在此框架中, SOR 分支是独立和灵活的, 与不同的检测方法合作, 从而容易用作插件。我们还在 SOR 的直径定位上设置了定位( PPA ), 将一个相对直径排序的动作比对等。