We propose a hierarchical clustering-based image segmentation scheme for deep neural networks, called HCFormer. We interpret image segmentation, including semantic, instance, and panoptic segmentation, as a pixel clustering problem, and accomplish it by bottom-up, hierarchical clustering with deep neural networks. Our hierarchical clustering removes the pixel decoder from conventional segmentation models and simplifies the segmentation pipeline, resulting in improved segmentation accuracies and interpretability. HCFormer can address semantic, instance, and panoptic segmentation with the same architecture because the pixel clustering is a common approach for various image segmentation. In experiments, HCFormer achieves comparable or superior segmentation accuracies compared to baseline methods on semantic segmentation (55.5 mIoU on ADE20K), instance segmentation (47.1 AP on COCO), and panoptic segmentation (55.7 PQ on COCO).
翻译:我们为深神经网络提出了一个基于等级集群的图像分解方案,称为 HCFormer 。 我们将图像分解,包括语义、实例和全光分解,解释为像素组解问题,并通过由下至上、分层和深神经网络完成。 我们的分层组合将像素分解器从常规分解模型中去除,并简化了分解管道,从而改善了分解的准确性和可解释性。 HCFormer 能够处理同一种结构中的语义、实例和全光分解,因为像素组解是各种图像分解的一种常见方法。 在实验中, HCFormer 实现的相近或超高分解分解孔,与语义分解的基线方法(ADE20K为55.5 mIOU)、实例分解(CO为47.1 AP)和光学分解(CO为55.7 PQ)。