Distinguishing among different marine benthic habitat characteristics is of key importance in a wide set of seabed operations ranging from installations of oil rigs to laying networks of cables and monitoring the impact of humans on marine ecosystems. The Side-Scan Sonar (SSS) is a widely used imaging sensor in this regard. It produces high-resolution seafloor maps by logging the intensities of sound waves reflected back from the seafloor. In this work, we leverage these acoustic intensity maps to produce pixel-wise categorization of different seafloor types. We propose a novel architecture adapted from the Vision Transformer (ViT) in an encoder-decoder framework. Further, in doing so, the applicability of ViTs is evaluated on smaller datasets. To overcome the lack of CNN-like inductive biases, thereby making ViTs more conducive to applications in low data regimes, we propose a novel feature extraction module to replace the Multi-layer Perceptron (MLP) block within transformer layers and a novel module to extract multiscale patch embeddings. A lightweight decoder is also proposed to complement this design in order to further boost multiscale feature extraction. With the modified architecture, we achieve state-of-the-art results and also meet real-time computational requirements. We make our code available at ~\url{https://github.com/hayatrajani/s3seg-vit
翻译:在一系列广泛的海底作业中,从安装石油钻机到铺设电缆网络和监测人类对海洋生态系统的影响,区分不同海洋海底生境特征至关重要。侧扫描声纳(SSS)是这方面广泛使用的成像传感器。它通过记录从海底反射回来的声波强度,制作高分辨率海底地图。在这项工作中,我们利用这些声频强度地图来生成不同海底类型的像素分解。我们提议在一个编码器-脱coder框架中,从视野变异器(VIT)中改编一个新的结构。此外,在这样做时,对维望仪的适用性进行了在较小数据集上的评价。为了克服缺少CNN式的感性偏差,从而使维特更有利于在低数据系统中的应用,我们提议了一个新的地貌提取模块,以取代变异器层多层的多层 Perceptron(MLP)块,并建立一个新模块,以提取多级的补丁基嵌合器。我们还提议了一个轻重的脱coder,以补充这一设计,从而进一步推进多级的多级地段提取。我们现有的结构。</s>