Training on datasets with long-tailed distributions has been challenging for major recognition tasks such as classification and detection. To deal with this challenge, image resampling is typically introduced as a simple but effective approach. However, we observe that long-tailed detection differs from classification since multiple classes may be present in one image. As a result, image resampling alone is not enough to yield a sufficiently balanced distribution at the object level. We address object-level resampling by introducing an object-centric memory replay strategy based on dynamic, episodic memory banks. Our proposed strategy has two benefits: 1) convenient object-level resampling without significant extra computation, and 2) implicit feature-level augmentation from model updates. We show that image-level and object-level resamplings are both important, and thus unify them with a joint resampling strategy (RIO). Our method outperforms state-of-the-art long-tailed detection and segmentation methods on LVIS v0.5 across various backbones. Code is available at https://github.com/NVlabs/RIO.
翻译:长期细致分布的数据集培训对主要识别任务(如分类和检测)一直具有挑战性。为了应对这一挑战,图像再取样通常是一种简单而有效的方法。然而,我们注意到,长尾的检测与分类不同,因为一个图像中可能存在多个类别。因此,光是图像再取样不足以在目标一级实现足够平衡的分布。我们通过采用基于动态、偶发记忆库的以物体为中心的内存重放战略来解决目标层面的再取样问题。我们提出的战略有两个好处:(1) 方便的物体一级再取样而无需大量额外计算,和(2) 模型更新中隐含的特性级增强。我们显示,图像一级和目标一级的再取样既重要,也因此与联合的再取样战略(RIO)相统一。我们的方法在各种主干线的LVIS v0.5上优于最先进的长尾量的检测和分解方法。代码可在https://github.com/NVlabs/RIO上查阅。