Interactive segmentation enables users to segment as needed by providing cues of objects, which introduces human-computer interaction for many fields, such as image editing and medical image analysis. Typically, massive and expansive pixel-level annotations are spent to train deep models by object-oriented interactions with manually labeled object masks. In this work, we reveal that informative interactions can be made by simulation with semantic-consistent yet diverse region exploration in an unsupervised paradigm. Concretely, we introduce a Multi-granularity Interaction Simulation (MIS) approach to open up a promising direction for unsupervised interactive segmentation. Drawing on the high-quality dense features produced by recent self-supervised models, we propose to gradually merge patches or regions with similar features to form more extensive regions and thus, every merged region serves as a semantic-meaningful multi-granularity proposal. By randomly sampling these proposals and simulating possible interactions based on them, we provide meaningful interaction at multiple granularities to teach the model to understand interactions. Our MIS significantly outperforms non-deep learning unsupervised methods and is even comparable with some previous deep-supervised methods without any annotation.
翻译:交互式分割让用户通过提供对象的线索进行需要的分割,为图像编辑和医学图像分析等许多领域引入了人机交互。通常,通过人工标注的对象掩模进行面向对象的交互来训练深度模型,需要大量广泛的像素级注释。在这项工作中,我们发现可以通过语义一致但多样性的区域探索模拟进行信息交互,提出了一种多重层次交互模拟(MIS)方法,为无监督交互式分割开辟了一条有前途的方向。具体来说,借鉴最近自监督模型生成的高质量密集特征,我们建议逐渐合并具有相似特征的补丁或区域以形成更广泛的区域,因此,每个合并的区域均作为语义明确的多重层次建议。通过对这些建议进行随机采样并基于它们模拟可能的交互,我们在多个层次上提供有意义的交互,从而教导模型理解交互。我们的MIS显着优于无深度学习的无监督方法,并且即使没有任何标注,与一些先前的深度监督方法相当。