We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. We identify sonar videos as a rich source of data for advancing low signal-to-noise computer vision applications and tackling domain generalization in multiple-object tracking (MOT) and counting. In comparison to existing MOT and counting datasets, which are largely restricted to videos of people and vehicles in cities, CFC is sourced from a natural-world domain where targets are not easily resolvable and appearance features cannot be easily leveraged for target re-identification. With over half a million annotations in over 1,500 videos sourced from seven different sonar cameras, CFC allows researchers to train MOT and counting algorithms and evaluate generalization performance at unseen test locations. We perform extensive baseline experiments and identify key challenges and opportunities for advancing the state of the art in generalization in MOT and counting.
翻译:我们介绍了Caltech鱼类计数数据集(CFC),这是一个用于探测、跟踪和计算声纳视频中的鱼类的大型数据集。我们把声纳视频确定为一个丰富的数据来源,用于推进低信号到噪音的计算机视觉应用和处理多物体跟踪(MOT)和计数中的广域化。与现有的MOT和计数数据集(主要限于城市人员和车辆的视频)相比,CFC来源于一个自然世界域,其目标不容易破解,而且外观特征无法轻易用于目标的重新识别。由于7个不同的声纳相机提供的1 500多段视频中有50多万条说明,CFC允许研究人员培训MOT和计算算算算算算算算法,并评估在看不见的测试地点的总体性表现。我们进行了广泛的基线实验,并查明了推进MOT和计数中艺术普及化现状的主要挑战和机遇。