Marine visual understanding is essential for monitoring and protecting marine ecosystems, enabling automatic and scalable biological surveys. However, progress is hindered by limited training data and the lack of a systematic task formulation that aligns domain-specific marine challenges with well-defined computer vision tasks, thereby limiting effective model application. To address this gap, we present ORCA, a multi-modal benchmark for marine research comprising 14,647 images from 478 species, with 42,217 bounding box annotations and 22,321 expert-verified instance captions. The dataset provides fine-grained visual and textual annotations that capture morphology-oriented attributes across diverse marine species. To catalyze methodological advances, we evaluate 18 state-of-the-art models on three tasks: object detection (closed-set and open-vocabulary), instance captioning, and visual grounding. Results highlight key challenges, including species diversity, morphological overlap, and specialized domain demands, underscoring the difficulty of marine understanding. ORCA thus establishes a comprehensive benchmark to advance research in marine domain. Project Page: http://orca.hkustvgd.com/.
翻译:海洋视觉理解对于监测与保护海洋生态系统、实现自动化与可扩展的生物调查至关重要。然而,该领域的发展受到训练数据有限以及缺乏系统化任务定义的制约,这些任务定义应能将领域特定的海洋挑战与定义明确的计算机视觉任务相结合,从而限制了模型的有效应用。为弥补这一空白,我们提出了ORCA,一个面向海洋研究的多模态基准数据集,包含来自478个物种的14,647张图像,带有42,217个边界框标注和22,321条专家验证的实例描述。该数据集提供了细粒度的视觉与文本标注,捕捉了多样化海洋物种中面向形态学的属性。为促进方法学进步,我们在三个任务上评估了18个前沿模型:目标检测(闭集与开放词汇)、实例描述和视觉定位。结果突显了关键挑战,包括物种多样性、形态重叠以及专业领域需求,从而揭示了海洋理解任务的难度。因此,ORCA建立了一个全面的基准,以推动海洋领域的研究。项目页面:http://orca.hkustvgd.com/。