We present a systematic study on a new task called dichotomous image segmentation (DIS) , which aims to segment highly accurate objects from natural images. To this end, we collected the first large-scale DIS dataset, called DIS5K, which contains 5,470 high-resolution (e.g., 2K, 4K or larger) images covering camouflaged, salient, or meticulous objects in various backgrounds. DIS is annotated with extremely fine-grained labels. Besides, we introduce a simple intermediate supervision baseline (IS-Net) using both feature-level and mask-level guidance for DIS model training. IS-Net outperforms various cutting-edge baselines on the proposed DIS5K, making it a general self-learned supervision network that can facilitate future research in DIS. Further, we design a new metric called human correction efforts (HCE) which approximates the number of mouse clicking operations required to correct the false positives and false negatives. HCE is utilized to measure the gap between models and real-world applications and thus can complement existing metrics. Finally, we conduct the largest-scale benchmark, evaluating 16 representative segmentation models, providing a more insightful discussion regarding object complexities, and showing several potential applications (e.g., background removal, art design, 3D reconstruction). Hoping these efforts can open up promising directions for both academic and industries. Project page: https://xuebinqin.github.io/dis/index.html.
翻译:我们对一个名为“二分形图像分割(DIS)”的新任务进行系统研究,目的是将自然图像中的高度准确对象进行分解,为此,我们收集了第一个称为DIS5K(DIS5K)的大型综合安全分遣队数据集,该数据集包括5 470个高分辨率(例如2K、4K或更大)图像,涵盖不同背景的伪装、突出或细微的物体。DIS用极细的标签附加说明。此外,我们还采用一个简单的中间监督基线(IS-Net),用于综合安全分遣队模式培训的地平级和顶级指导。IS-Net超越了拟议的DIS5K(DIS-Net)上的各种尖端基线,使其成为一个普遍的自学监督网络,便于将来在综合安全分遣队中进行研究。此外,我们设计了一个称为人类校正努力(HE)的新指标,该指标与纠正假正数和假正反差的操作数量相近。HCE用来衡量模型与现实/世界应用之间的差距,从而补充现有的指标。最后,我们进行了规模最大的基准,评估了16个具有代表性的分解模型的系统设计模型,可以提供这些有希望的建筑的系统设计模型。