Underwater acoustic cameras are high potential devices for many applications in ecology, notably for fisheries management and monitoring. However how to extract such data into high value information without a time-consuming entire dataset reading by an operator is still a challenge. Moreover the analysis of acoustic imaging, due to its low signal-to-noise ratio, is a perfect training ground for experimenting with new approaches, especially concerning Deep Learning techniques. We present hereby a novel approach that takes advantage of both CNN (Convolutional Neural Network) and classical CV (Computer Vision) techniques, able to detect a generic class ''fish'' in acoustic video streams. The pipeline pre-treats the acoustic images to extract 2 features, in order to localise the signals and improve the detection performances. To ensure the performances from an ecological point of view, we propose also a two-step validation, one to validate the results of the trainings and one to test the method on a real-world scenario. The YOLOv3-based model was trained with data of fish from multiple species recorded by the two common acoustic cameras, DIDSON and ARIS, including species of high ecological interest, as Atlantic salmon or European eels. The model we developed provides satisfying results detecting almost 80% of fish and minimizing the false positive rate, however the model is much less efficient for eel detections on ARIS videos. The first CNN pipeline for fish monitoring exploiting video data from two models of acoustic cameras satisfies most of the required features. Many challenges are still present, such as the automation of fish species identification through a multiclass model. 1 However the results point a new solution for dealing with complex data, such as sonar data, which can also be reapplied in other cases where the signal-to-noise ratio is a challenge.
翻译:水下声学照相机是生态学许多应用的极具潜力的装置,特别是在渔业管理和监测方面。然而,如何在操作者不花费时间全数据集阅读的情况下将这类数据提取为高价值信息仍是一个挑战。此外,由于声学成像的信号对噪音比,由于声学成像的分析,由于信号对噪音的比重较低,因此是试验新方法的完美培训场所,特别是深层学习技术。我们在此提出一种新型方法,利用CNN(Cultural Neal 网络)和经典CV(Computer Vision)技术,在声学视频流中能够探测出一个通用的“鱼”类。管道前将声学成的鱼类图像提取2个功能,以便确定信号和探测性能。为了确保从生态学角度出发的性能,我们还提议分两步验证培训的结果,一个是验证培训结果,一个是测试现实世界情景的模型。基于YOLOv3的模型仍然由两个共同的声学照相机、DISON和ARIS的多种鱼类数据来进行挑战。管道预选,包括高生态级的鱼级数据级数据,从而测量测测测测测测测取了80的金属数据,从而测量结果,另一个数据能能能能能能能为另一个。我们测测测测测测测测测测测测测测测测测测测测测测测测取了另一种。