深眼:一个显眼物体探测器,其工作方式与人类视觉特征相同 (Sharp Eyes: A Salient Object Detector Working The Same Way as Human Visual Characteristics)

Current methods aggregate multi-level features or introduce edge and skeleton to get more refined saliency maps. However, little attention is paid to how to obtain the complete salient object in cluttered background, where the targets are usually similar in color and texture to the background. To handle this complex scene, we propose a sharp eyes network (SENet) that first seperates the object from scene, and then finely segments it, which is in line with human visual characteristics, i.e., to look first and then focus. Different from previous methods which directly integrate edge or skeleton to supplement the defects of objects, the proposed method aims to utilize the expanded objects to guide the network obtain complete prediction. Specifically, SENet mainly consists of target separation (TS) brach and object segmentation (OS) branch trained by minimizing a new hierarchical difference aware (HDA) loss. In the TS branch, we construct a fractal structure to produce saliency features with expanded boundary via the supervision of expanded ground truth, which can enlarge the detail difference between foreground and background. In the OS branch, we first aggregate multi-level features to adaptively select complementary components, and then feed the saliency features with expanded boundary into aggregated features to guide the network obtain complete prediction. Moreover, we propose the HDA loss to further improve the structural integrity and local details of the salient objects, which assigns weight to each pixel according to its distance from the boundary hierarchically. Hard pixels with similar appearance in border region will be given more attention hierarchically to emphasize their importance in completeness prediction. Comprehensive experimental results on five datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.

翻译：目前的方法集聚多层次特征,或引入边缘和骨架,以获得更精细的显著地图。然而,对于如何在模糊的背景中获得完整突出对象,没有多少注意,因为其目标通常在颜色和纹理上与背景相似。为了处理这一复杂的场景,我们建议建立一个尖锐的眼网(SENet),首先将物体从场外隔开,然后细小地分割它,这与人类视觉特征相一致,即先看然后看,然后集中。不同于以前直接结合边缘或骨架以补充对象缺陷的方法,拟议的方法旨在利用扩大的物体来指导网络的完整预测。具体而言,SENet主要包括目标分离(TS) brach 和对象分割(OS) 。为了处理这一复杂的场面景,我们建议了一个尖锐的眼网网网网(SESNet) 网络(SESNet),先通过监督扩大地面和背景的注意,再扩大地面和背景之间的详细度差异。在OS分支中,我们首先将多层次特性汇总,然后将数据的多层次特性汇总到适应性地段间测测测测测测测测,然后再显示每个深度测测测测测的轨道的轨道,然后将显示。