The ocean is experiencing unprecedented rapid change, and visually monitoring marine biota at the spatiotemporal scales needed for responsible stewardship is a formidable task. As baselines are sought by the research community, the volume and rate of this required data collection rapidly outpaces our abilities to process and analyze them. Recent advances in machine learning enables fast, sophisticated analysis of visual data, but have had limited success in the ocean due to lack of data standardization, insufficient formatting, and demand for large, labeled datasets. To address this need, we built FathomNet, an open-source image database that standardizes and aggregates expertly curated labeled data. FathomNet has been seeded with existing iconic and non-iconic imagery of marine animals, underwater equipment, debris, and other concepts, and allows for future contributions from distributed data sources. We demonstrate how FathomNet data can be used to train and deploy models on other institutional video to reduce annotation effort, and enable automated tracking of underwater concepts when integrated with robotic vehicles. As FathomNet continues to grow and incorporate more labeled data from the community, we can accelerate the processing of visual data to achieve a healthy and sustainable global ocean.
翻译:海洋正在经历空前的迅速变化,而负责任地管理所需的时空空间对海洋生物群进行视觉监测是一项艰巨的任务。随着研究界寻求基线,这种所需数据收集的数量和速度迅速超过我们处理和分析这些数据的能力。机器学习的最近进展使得能够对视觉数据进行快速、复杂的分析,但由于缺乏数据标准化、格式化不足和对大标记数据集的需求,海洋的成功有限。为了满足这一需要,我们建立了FathomNet,这是一个开放源图像数据库,标准化和汇总有专家标记的数据。FathomNet以现有的海洋动物、水下设备、碎片和其他概念的标志性和非气候性图像播种,并允许分布式数据源今后作出贡献。我们展示如何利用FathomNet数据来培训和部署其他机构录像的模型,以减少注解工作,并在与机器人飞行器结合时能够自动跟踪水下概念。随着FathomNet继续增长并纳入来自社区的更多有标签的数据。我们可以加速对视觉数据的处理,以实现可持续的海洋数据。