The recent explosion in the availability of echosounder data from diverse ocean platforms has created unprecedented opportunities to observe the marine ecosystems at broad scales. However, the critical lack of methods capable of automatically discovering and summarizing prominent spatio-temporal echogram structures has limited the effective and wider use of these rich datasets. To address this challenge, we develop a data-driven methodology based on matrix decomposition that builds compact representation of long-term echosounder time series using intrinsic features in the data. In a two-stage approach, we first remove noisy outliers from the data by Principal Component Pursuit, then employ a temporally smooth Nonnegative Matrix Factorization to automatically discover a small number of distinct daily echogram patterns, whose time-varying linear combination (activation) reconstructs the dominant echogram structures. This low-rank representation provides biological information that is more tractable and interpretable than the original data, and is suitable for visualization and systematic analysis with other ocean variables. Unlike existing methods that rely on fixed, handcrafted rules, our unsupervised machine learning approach is well-suited for extracting information from data collected from unfamiliar or rapidly changing ecosystems. This work forms the basis for constructing robust time series analytics for large-scale, acoustics-based biological observation in the ocean.
翻译:最近,不同海洋平台的回声素数据大量涌现,为广泛观测海洋生态系统创造了前所未有的机会;然而,由于严重缺乏能够自动发现和总结突出的时空回声结构的方法,限制了这些丰富数据集的有效和广泛使用;为了应对这一挑战,我们根据矩阵分解开发了一种数据驱动方法,该方法利用数据中的内在特征,建立长期回声素时间序列的缩略表。在两阶段办法中,我们首先从主要组成部分追求的数据中清除噪音外源,然后采用时间上平稳的非负式矩阵计算法,自动发现少量不同的每日回声图模式,这些模式的时间变化线性组合(活化)重建了主要的回声仪结构。这种低级代表法提供了比原始数据更易理解和可解释的生物信息,并且适合与其他海洋变量进行直观化和系统分析。与现有方法不同,我们未经监督的机器学习方法非常适合从不熟悉或迅速变化的海洋生态系统中提取大量数据。