Skeleton extraction is a task focused on providing a simple representation of an object by extracting the skeleton from the given binary or RGB image. In recent years many attractive works in skeleton extraction have been made. But as far as we know, there is little research on how to utilize the context information in the binary shape of objects. In this paper, we propose an attention-based model called Context Attention Network (CANet), which integrates the context extraction module in a UNet architecture and can effectively improve the ability of network to extract the skeleton pixels. Meanwhile, we also use some novel techniques including distance transform, weight focal loss to achieve good results on the given dataset. Finally, without model ensemble and with only 80% of the training images, our method achieves 0.822 F1 score during the development phase and 0.8507 F1 score during the final phase of the Pixel SkelNetOn Competition, ranking 1st place on the leaderboard.
翻译:Skeleton 提取是一个任务,重点是通过从给定的二进制图像或 RGB 图像中提取骨骼来提供一个简单的对象描述。 近年来,在骨骼提取方面做了许多有吸引力的工作。 但是,据我们所知,对于如何在对象的二进制形状中利用背景信息的研究很少。 在本文中,我们提出了一个名为“背景关注网络”的基于关注的模型,它将背景提取模块整合到UNet 结构中,并能够有效地提高网络提取骨骼像素的能力。 同时,我们还使用一些新技术,包括远程转换、重量焦点损失等,以在给定数据集上取得良好结果。 最后,没有模型组合,只有80%的培训图像,我们的方法在开发阶段达到了0.822 F1分,在Pixel SkelNetOn 竞争的最后阶段达到了0.8507 F1分,在领导板上排名第1位。