This study examines the effectiveness of spatio-temporal modeling and the integration of spatial attention mechanisms in deep learning models for underwater object detection. Specifically, in the first phase, the performance of temporal-enhanced YOLOv5 variant T-YOLOv5 is evaluated, in comparison with the standard YOLOv5. For the second phase, an augmented version of T-YOLOv5 is developed, through the addition of a Convolutional Block Attention Module (CBAM). By examining the effectiveness of the already pre-existing YOLOv5 and T-YOLOv5 models and of the newly developed T-YOLOv5 with CBAM. With CBAM, the research highlights how temporal modeling improves detection accuracy in dynamic marine environments, particularly under conditions of sudden movements, partial occlusions, and gradual motion. The testing results showed that YOLOv5 achieved a mAP@50-95 of 0.563, while T-YOLOv5 and T-YOLOv5 with CBAM outperformed with mAP@50-95 scores of 0.813 and 0.811, respectively, highlighting their superior accuracy and generalization in detecting complex objects. The findings demonstrate that T-YOLOv5 significantly enhances detection reliability compared to the standard model, while T-YOLOv5 with CBAM further improves performance in challenging scenarios, although there is a loss of accuracy when it comes to simpler scenarios.
翻译:本研究探讨了时空建模与空间注意力机制在深度学习模型中对水下目标检测的有效性。具体而言,第一阶段评估了时间增强型YOLOv5变体T-YOLOv5的性能,并与标准YOLOv5进行了比较。第二阶段通过添加卷积块注意力模块(CBAM),开发了T-YOLOv5的增强版本。通过检验已有的YOLOv5和T-YOLOv5模型以及新开发的带CBAM的T-YOLOv5的有效性,本研究强调了时间建模如何在动态海洋环境中提高检测精度,特别是在突发运动、部分遮挡和渐进运动条件下。测试结果显示,YOLOv5的mAP@50-95为0.563,而T-YOLOv5和带CBAM的T-YOLOv5分别以0.813和0.811的mAP@50-95分数表现更优,突显了它们在检测复杂目标时更高的准确性和泛化能力。研究结果表明,与标准模型相比,T-YOLOv5显著提升了检测可靠性,而带CBAM的T-YOLOv5在挑战性场景中进一步改善了性能,尽管在简单场景下存在精度损失。