Existing camouflaged object detection (COD) methods rely heavily on large-scale datasets with pixel-wise annotations. However, due to the ambiguous boundary, it is very time-consuming and labor-intensive to annotate camouflage objects pixel-wisely (which takes ~ 60 minutes per image). In this paper, we propose the first weakly-supervised camouflaged object detection (COD) method, using scribble annotations as supervision. To achieve this, we first construct a scribble-based camouflaged object dataset with 4,040 images and corresponding scribble annotations. It is worth noting that annotating the scribbles used in our dataset takes only ~ 10 seconds per image, which is 360 times faster than per-pixel annotations. However, the network directly using scribble annotations for supervision will fail to localize the boundary of camouflaged objects and tend to have inconsistent predictions since scribble annotations only describe the primary structure of objects without details. To tackle this problem, we propose a novel consistency loss composed of two parts: a reliable cross-view loss to attain reliable consistency over different images, and a soft inside-view loss to maintain consistency inside a single prediction map. Besides, we observe that humans use semantic information to segment regions near boundaries of camouflaged objects. Therefore, we design a feature-guided loss, which includes visual features directly extracted from images and semantically significant features captured by models. Moreover, we propose a novel network that detects camouflaged objects by scribble learning on structural information and semantic relations. Experimental results show that our model outperforms relevant state-of-the-art methods on three COD benchmarks with an average improvement of 11.0% on MAE, 3.2% on S-measure, 2.5% on E-measure and 4.4% on weighted F-measure.
翻译:已有的伪装对象检测方法(COD) 严重依赖大型的隐蔽对象检测(COD) 。 但是,由于模糊的边界, 它非常耗时和劳动密集, 以像素为方向( 每张图像需要 ~ 60 分钟 ) 来批注伪装对象的隐蔽对象( COD ) 。 在本文中, 我们建议了第一个薄弱的、 监视不力的隐蔽的隐蔽物体检测( COD) 方法, 使用刻字说明作为监督。 为了实现这一点, 我们首先建了一个有4 040 图像和相应的刻字说明的基于刻字的隐蔽的隐形对象数据集。 值得注意的是, 我们数据集中使用的刻字的隐形对象( E) 缩略图的缩略图只用了10 10 秒, 直径( Eliveral) 图像的缩略图中, 直略图中显示的缩略图中的缩略图 。