Underwater caves are challenging environments that are crucial for water resource management, and for our understanding of hydro-geology and history. Mapping underwater caves is a time-consuming, labor-intensive, and hazardous operation. For autonomous cave mapping by underwater robots, the major challenge lies in vision-based estimation in the complete absence of ambient light, which results in constantly moving shadows due to the motion of the camera-light setup. Thus, detecting and following the caveline as navigation guidance is paramount for robots in autonomous cave mapping missions. In this paper, we present a computationally light caveline detection model based on a novel Vision Transformer (ViT)-based learning pipeline. We address the problem of scarce annotated training data by a weakly supervised formulation where the learning is reinforced through a series of noisy predictions from intermediate sub-optimal models. We validate the utility and effectiveness of such weak supervision for caveline detection and tracking in three different cave locations: USA, Mexico, and Spain. Experimental results demonstrate that our proposed model, CL-ViT, balances the robustness-efficiency trade-off, ensuring good generalization performance while offering 10+ FPS on single-board (Jetson TX2) devices.
翻译:水下洞穴是具有挑战性的环境,对水资源管理至关重要,也是我们对水文地质学和历史的理解至关重要。水下洞穴的测绘是一种耗时、劳力密集和危险的操作。对于水下机器人自主绘制洞穴图来说,主要的挑战在于完全没有环境光,因此,由于摄像-光线的移动,以视景为基础进行估计,从而导致不断移动阴影。因此,探测和遵循洞穴线是自主洞穴绘图任务中机器人的首要导航指导。在本文中,我们提出了一个基于新颖的视野变异器(VIT)学习管道的计算光洞穴探测模型。我们通过一种监督薄弱的配方来解决缺乏附加说明的培训数据的问题,通过一系列中间的次最佳模型的噪音预测加强学习。我们验证了在三个不同的洞穴穴点(美国、墨西哥和西班牙)对洞穴穴探测和跟踪的这种薄弱监督的效用和有效性。实验结果表明,我们提议的模型(CL-VIT)平衡了稳健的临界效率交易,确保良好的通用性贸易,同时提供10+FPS-X2号单板上的功能。</s>