TemNet:在时间上注意在视频中发现动物行为 (TempNet: Temporal Attention Towards the Detection of Animal Behaviour in Videos)

Recent advancements in cabled ocean observatories have increased the quality and prevalence of underwater videos; this data enables the extraction of high-level biologically relevant information such as species' behaviours. Despite this increase in capability, most modern methods for the automatic interpretation of underwater videos focus only on the detection and counting organisms. We propose an efficient computer vision- and deep learning-based method for the detection of biological behaviours in videos. TempNet uses an encoder bridge and residual blocks to maintain model performance with a two-staged, spatial, then temporal, encoder. TempNet also presents temporal attention during spatial encoding as well as Wavelet Down-Sampling pre-processing to improve model accuracy. Although our system is designed for applications to diverse fish behaviours (i.e, is generic), we demonstrate its application to the detection of sablefish (Anoplopoma fimbria) startle events. We compare the proposed approach with a state-of-the-art end-to-end video detection method (ReMotENet) and a hybrid method previously offered exclusively for the detection of sablefish's startle events in videos from an existing dataset. Results show that our novel method comfortably outperforms the comparison baselines in multiple metrics, reaching a per-clip accuracy and precision of 80% and 0.81, respectively. This represents a relative improvement of 31% in accuracy and 27% in precision over the compared methods using this dataset. Our computational pipeline is also highly efficient, as it can process each 4-second video clip in only 38ms. Furthermore, since it does not employ features specific to sablefish startle events, our system can be easily extended to other behaviours in future works.

翻译：有线海洋观测站最近的进展提高了水下视频的质量和普及程度;这一数据还有助于提取诸如物种行为等与生物有关的高层次生物信息。尽管能力有了提高,但大多数对水下视频进行自动解释的现代方法仅侧重于检测和计数生物。我们提出了一种高效的计算机视觉和深层次学习方法,用于检测视频中的生物行为。TemNet使用一个编码器桥和残留区块来保持模型性能,使用两个阶段的、空间的、时间的、时间的和编码器。TemNet还提供空间编码和Wavelet下游预处理期间的时间性关注,以提高模型的准确性。尽管我们的系统是设计用于多种鱼类行为(即通用的)应用的,但我们却展示了对可捕鱼类(Anoopoma fembria)起始事件的检测的高效的计算机视觉和深层学习方法。我们将拟议方法与最先进的终端到终端的视频检测方法(RemoteENet)进行比较,而以前则专门用于检测现有数据序列中的鱼的启动事件。在目前数据序列中,通过一种特定的精确度的精确度指标显示我们的80的精确度,从我们的方法可以追溯到现在的精确度的精确度, 。