In this paper, we explore the mask representation in instance segmentation with Point-of-Interest (PoI) features. Differentiating multiple potential instances within a single PoI feature is challenging because learning a high-dimensional mask feature for each instance using vanilla convolution demands a heavy computing burden. To address this challenge, we propose an instance-aware convolution. It decomposes this mask representation learning task into two tractable modules as instance-aware weights and instance-agnostic features. The former is to parametrize convolution for producing mask features corresponding to different instances, improving mask learning efficiency by avoiding employing several independent convolutions. Meanwhile, the latter serves as mask templates in a single point. Together, instance-aware mask features are computed by convolving the template with dynamic weights, used for the mask prediction. Along with instance-aware convolution, we propose PointINS, a simple and practical instance segmentation approach, building upon dense one-stage detectors. Through extensive experiments, we evaluated the effectiveness of our framework built upon RetinaNet and FCOS. PointINS in ResNet101 backbone achieves a 38.3 mask mean average precision (mAP) on COCO dataset, outperforming existing point-based methods by a large margin. It gives a comparable performance to the region-based Mask R-CNN with faster inference.
翻译:在本文中,我们探索了隐形表示方式,用利得点(PoI)特征进行实例分解。在单一的PoI特征中,区分多种潜在情况是具有挑战性的,因为使用香草卷卷卷为每个情况学习高维面罩特征需要沉重的计算负担。为了应对这一挑战,我们建议采用一个有真伪的混凝土。将这种隐形表示学习任务分解成两个可移动模块,作为有真伪的重量和例中性分解特征。前者是要对产生与不同情况相对应的面具特征的组合进行平衡,通过避免使用若干独立的相联来提高蒙面学习效率。同时,后者是一个单一点的掩码模板。同时,通过将带有动态权重的模板一起进行计算。我们提出隐形代表学习任务分解成两个可移动模块,作为容积的单点、简单而实用的实例分解方法,在密集的单级探测器的基础上,我们用Retinnet网络和FCOS.PENINS在ResNet101骨架中作为掩模模板,在383中以可比较的平均精确度上,以现有数据为基础,我们评估了我们的框架的有效性。