We propose a saliency-based, multi-target detection and segmentation framework for multi-aspect, semi-coherent imagery formed from circular-scan, synthetic-aperture sonar (CSAS). Our framework relies on a multi-branch, convolutional encoder-decoder network (MB-CEDN). The encoder portion extracts features from one or more CSAS images of the targets. These features are then split off and fed into multiple decoders that perform pixel-level classification on the extracted features to roughly mask the target in an unsupervised-trained manner and detect foreground and background pixels in a supervised-trained manner. Each of these target-detection estimates provide different perspectives as to what constitute a target. These opinions are cascaded into a deep-parsing network to model contextual and spatial constraints that help isolate targets better than either solution estimate alone. We evaluate our framework using real-world CSAS data with five broad target classes. Since we are the first to consider both CSAS target detection and segmentation, we adapt existing image and video-processing network topologies from the literature for comparative purposes. We show that our framework outperforms supervised deep networks. It greatly outperforms state-of-the-art unsupervised approaches for diverse target and seafloor types.
翻译:我们提出一个基于显著的、多目标的检测和分解框架,用于通过循环扫描、合成合成孔径声纳(CSAS)生成的半相容图像。我们的框架依赖于一个多处、进化编码解码网络(MB-CEDN)。编码器部分从一个或多个目标的CSAS图像中提取特征。然后这些特征被分割成多个分解器,用于对提取的特性进行像素级分类,以未经监督的训练方式大致遮盖目标,并以受监督的训练方式探测地表和背景像素。每个目标检测估计都对目标的构成提供了不同的观点。这些观点被分解成一个深层次的网络,用以模拟背景和空间限制,从而帮助将目标与两个解决方案的单独估计相隔绝。我们用真实世界的CSASAS数据用五个大目标类别来评估我们的框架。因为我们是首先考虑CSAS目标检测和分解的首选,我们从深度的图像和图像处理网络顶部和背景像像像像像像像像像像像像像像像像像像像像素,从深度的模型上展示了我们的地图,用于比较的目的框架。