In applied image segmentation tasks, the ability to provide numerous and precise labels for training is paramount to the accuracy of the model at inference time. However, this overhead is often neglected, and recently proposed segmentation architectures rely heavily on the availability and fidelity of ground truth labels to achieve state-of-the-art accuracies. Failure to acknowledge the difficulty in creating adequate ground truths can lead to an over-reliance on pre-trained models or a lack of adoption in real-world applications. We introduce Points2Polygons (P2P), a model which makes use of contextual metric learning techniques that directly addresses this problem. Points2Polygons performs well against existing fully-supervised segmentation baselines with limited training data, despite using lightweight segmentation models (U-Net with a ResNet18 backbone) and having access to only weak labels in the form of object centroids and no pre-training. We demonstrate this on several different small but non-trivial datasets. We show that metric learning using contextual data provides key insights for self-supervised tasks in general, and allow segmentation models to easily generalize across traditionally label-intensive domains in computer vision.
翻译:在应用图像分解任务中,为培训提供大量精确标签的能力对于模型在推算时间的准确性至关重要。然而,这种间接成本往往被忽视,最近提议的分解结构严重依赖地面真实标签的可用性和真实性,以实现最先进的理解。不承认在创造充分地面事实方面的困难可能导致过度依赖预先培训的模型,或者在现实世界应用中缺乏采纳。我们引入了Ppoint2Pollygons(P2P)这一模型,该模型利用直接解决这一问题的背景衡量学习技术。 点2Pollygons(P2P)与现有的完全监督的分解基线和有限的培训数据运行良好,尽管使用了轻度分解模型(有ResNet18骨架的U-Net),而且只能接触以对象中央机器人和没有培训为形式的薄弱标签。我们在若干不同的小型但非尖端数据集中展示了这一点。我们展示了使用背景数据进行衡量学习的做法,为一般的自我监督任务提供了关键洞察力。我们展示了在计算机传统视野域中进行简单化的分解模型。