在照相机图象中识别动物物种的长期目视识别图案袋 (Bag of Tricks for Long-Tail Visual Recognition of Animal Species in Camera-Trap Images)

Camera traps are a strategy for monitoring wildlife and they collect a large number of pictures. The number of images collected of each species usually follows a long-tail distribution, i.e., a few classes have a large number of instances, while a lot of species have just a small percentage. Although in most cases these rare species are the ones of interest to ecologists, they are often neglected when using deep-learning models because these models require a large number of images for the training. In this work, a simple and effective framework called Square-Root Sampling Branch (SSB) is proposed, which combines two classification branches that are trained using square-root sampling and instance sampling to improve long-tail visual recognition, and this is compared to state-of-the-art methods for handling this task: square-root sampling, class-balanced focal loss, and balanced group softmax. To achieve a more general conclusion, the methods for handling long-tail visual recognition were systematically evaluated in four families of computer vision models (ResNet, MobileNetV3, EfficientNetV2, and Swin Transformer) and four camera-trap datasets with different characteristics. Initially, a robust baseline with the most recent training tricks was prepared and, then, the methods for improving long-tail recognition were applied. Our experiments show that square-root sampling was the method that most improved the performance for minority classes by around 15%; however, this was at the cost of reducing the majority classes' accuracy by at least 3%. Our proposed framework (SSB) demonstrated itself to be competitive with the other methods and achieved the best or the second-best results for most of the cases for the tail classes; but, unlike the square-root sampling, the loss in the performance of the head classes was minimal, thus achieving the best trade-off among all the evaluated methods.

翻译：相机陷阱是监测野生生物的一种策略,它们收集了大量图片。每个物种收集的图像数量通常经过长尾分布, 也就是说, 少数种类收集的图像数量通常经过长尾分布, 也就是说, 少数类收集了大量实例, 而许多物种只是小百分比。虽然这些稀有物种在多数情况下是生态学家感兴趣的, 但是在使用深层学习模型时,它们常常被忽略, 因为这些模型需要大量的图像来进行培训。在这项工作中, 提出了一个简单有效的框架, 称为 Square- Rooot 取样处( SSSB) 。它将两个通过平底采样采样取样和试样取样的样本部门合并起来, 这与最先进的方法相比: 平底采样、级平衡的焦点损失和平衡组的软体。为了更普遍的结论, 在计算机视觉模型的四个贸易组( ResNet, Movetal NetV3, Syald NetV2, Swin transfer) 和四个摄像组数据组的取样组进行合并, 提高长尾采样的取样组本身的样本识别, 3级。因此, 以最强的精度的精度以最精确的精度为最精确的精度分析方法, 我们的精度的精度的精度进行了最精确的精度的精度的精细的精细的精细的精细的精细的精细的精细的精细的精度评估, 。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日