Diffusion frameworks have achieved comparable performance with previous state-of-the-art image generation models. Researchers are curious about its variants in discriminative tasks because of its powerful noise-to-image denoising pipeline. This paper proposes DiffusionInst, a novel framework that represents instances as instance-aware filters and formulates instance segmentation as a noise-to-filter denoising process. The model is trained to reverse the noisy groundtruth without any inductive bias from RPN. During inference, it takes a randomly generated filter as input and outputs mask in one-step or multi-step denoising. Extensive experimental results on COCO and LVIS show that DiffusionInst achieves competitive performance compared to existing instance segmentation models with various backbones, such as ResNet and Swin Transformers. We hope our work could serve as a strong baseline, which could inspire designing more efficient diffusion frameworks for challenging discriminative tasks. Our code is available in https://github.com/chenhaoxing/DiffusionInst.
翻译:传播框架取得了与以往最先进的图像生成模型相似的效绩。 研究人员对其在歧视性任务中的变异性感到好奇, 因为它具有强大的噪音到图像去除管道功能。 本文提议了“ 传播Inst ” (DifproductionInst), 这个新框架代表了实例觉悟过滤器, 并将实例分解作为噪音到过滤器的分解过程。 该模型经过培训, 可以在没有 RPN 任何感应偏差的情况下扭转噪音的地面真实性。 在推断过程中, 它使用随机生成的过滤器作为单步或多步分解的输入和输出面具。 COCO 和 LVIS 的广泛实验结果显示, 与ResNet 和 Swin 变异器等各种骨干的现有分解模型相比, DifluctionInst 取得了竞争性的绩效。 我们希望我们的工作可以作为一个强有力的基线, 从而激励设计更高效的传播框架来挑战歧视性任务。 我们的代码可以在 https://github.com/ chenhaoxing/Diffunstst。