Label distributions in camera-trap images are highly imbalanced and long-tailed, resulting in neural networks tending to be biased towards head-classes that appear frequently. Although long-tail learning has been extremely explored to address data imbalances, few studies have been conducted to consider camera-trap characteristics, such as multi-domain and multi-frame setup. Here, we propose a unified framework and introduce two datasets for long-tailed camera-trap recognition. We first design domain experts, where each expert learns to balance imperfect decision boundaries caused by data imbalances and complement each other to generate domain-balanced decision boundaries. Also, we propose a flow consistency loss to focus on moving objects, expecting class activation maps of multi-frame matches the flow with optical flow maps for input images. Moreover, two long-tailed camera-trap datasets, WCS-LT and DMZ-LT, are introduced to validate our methods. Experimental results show the effectiveness of our framework, and proposed methods outperform previous methods on recessive domain samples.
翻译:摄像区图像中标签的分布高度不平衡,而且长尾错乱,导致神经网络倾向于偏向于似乎经常出现的头类。虽然对长尾学习进行了极为深入的探索以解决数据不平衡问题,但几乎没有开展什么研究来考虑摄像区特性,例如多域和多框架设置。在这里,我们提出了一个统一框架,并为长尾摄像区识别引入了两个数据集。我们首先设计了域专家,每个专家学习如何平衡数据不平衡造成的不完善的决定界限,并相互补充以产生域间平衡的决定界限。此外,我们提议了一种流动一致性损失,以关注移动对象,期待多框架的级启动图与输入图像的光学流图匹配。此外,还引入了两个长尾摄像区数据集,即WCS-LT和DMZ-LT,以验证我们的方法。实验结果显示了我们框架的有效性,并提议的方法优于先前对封闭区样品采用的方法。