The global and local contexts significantly contribute to the integrity of predictions in Salient Object Detection (SOD). Unfortunately, existing methods still struggle to generate complete predictions with fine details. There are two major problems in conventional approaches: first, for global context, high-level CNN-based encoder features cannot effectively catch long-range dependencies, resulting in incomplete predictions. Second, downsampling the ground truth to fit the size of predictions will introduce inaccuracy as the ground truth details are lost during interpolation or pooling. Thus, in this work, we developed a Transformer-based network and framed a supervised task for a branch to learn the global context information explicitly. Besides, we adopt Pixel Shuffle from Super-Resolution (SR) to reshape the predictions back to the size of ground truth instead of the reverse. Thus details in the ground truth are untouched. In addition, we developed a two-stage Context Refinement Module (CRM) to fuse global context and automatically locate and refine the local details in the predictions. The proposed network can guide and correct itself based on the global and local context generated, thus is named, Self-Refined Transformer (SelfReformer). Extensive experiments and evaluation results on five benchmark datasets demonstrate the outstanding performance of the network, and we achieved the state-of-the-art.
翻译:全球和当地环境大大促进了 " 显性物体探测 " (SOD)预测的完整性。不幸的是,现有方法仍然难以以详细细节得出完整的预测。常规方法存在两大问题:首先,全球背景,基于CNN的高级编码器功能无法有效捕捉长期依赖性,导致预测不完全。第二,根据预测大小对地面真相进行缩小取样将造成不准确性,因为地面真相细节在内插或汇集期间丢失。因此,我们在这一工作中开发了一个基于变换器的网络,并为一个分支设计了一个监督任务,以明确了解全球背景信息。此外,我们采用了超级分辨率(SR)的皮塞尔打字机,将预测重新定位到地面真相大小,而不是相反。因此,地面真相的细节没有受到影响。此外,我们开发了一个两阶段的 " 环境精细化模块 " (CRMM),以整合全球背景,并自动定位和完善预测中的当地细节。拟议的网络可以根据全球和当地背景来指导并纠正自己,明确了解全球背景信息。此外,我们采用了超级分辨率(SR)的Pixel Shiflefleflex-Refal-Refal-Refal the the Fal-Silferviewd the sal-Silveal-Silth the sal-Silviewstrisal-Sildal-Sildal-Silveal-Sildal-Sildaldaldaldal-Sildal-Sildaldaldal-Sildal-Sildal-Sildal-Sildaldaldaldal-Servial-Sildal-Servial-Servivaldddal-Supdal-Supdddaldaldal-Supdaldaldal-Supds) 。我们根据全球和5 和5Svivaldaldaldaldaldaldaldaldalddal-Servivaldddddddalddal 。我们全球和五制的数据,我们。我们。我们所建,我们所建的模型,并展示了基础和五制的数据,并展示了五制数据。我们。我们。我们。我们已建立的