This work reviews the problem of object detection in underwater environments. We analyse and quantify the shortcomings of conventional state-of-the-art (SOTA) algorithms in the computer vision community when applied to this challenging environment, as well as providing insights and general guidelines for future research efforts. First, we assessed if pretraining with the conventional ImageNet is beneficial when the object detector needs to be applied to environments that may be characterised by a different feature distribution. We then investigate whether two-stage detectors yields to better performance with respect to single-stage detectors, in terms of accuracy, intersection of union (IoU), floating operation per second (FLOPS), and inference time. Finally, we assessed the generalisation capability of each model to a lower quality dataset to simulate performance on a real scenario, in which harsher conditions ought to be expected. Our experimental results provide evidence that underwater object detection requires searching for "ad-hoc" architectures than merely training SOTA architectures on new data, and that pretraining is not beneficial.
翻译:这项工作审视了水下环境中物体探测的问题。 当应用到这一具有挑战性的环境时,我们分析和量化计算机视觉界常规状态算法(SOTA)的缺点,并为今后的研究工作提供洞察力和一般准则。首先,我们评估了在使用常规图像网络之前,如果物体探测器需要应用到可能具有不同特征分布特征的环境时,使用常规图像网络进行训练是否有益。然后,我们调查了两阶段探测器是否在单级探测器方面产生更好的性能,在精确度、结合交错、每秒浮动操作(FLOPS)和推断时间方面。最后,我们评估了每种模型的通用性能力,以更低质量的数据集模拟真实情景的性能,在真实情景中,应当预测更恶劣的条件。我们的实验结果提供了证据,水下物体探测需要搜索“适应”结构,而不仅仅是对SOTA结构进行新数据的培训,而预培训是没有好处的。