Natural data are often long-tail distributed over semantic classes. Existing recognition methods tend to focus on tail performance gain, often at the expense of head performance loss from increased classifier variance. The low tail performance manifests itself in large inter-class confusion and high classifier variance. We aim to reduce both the bias and the variance of a long-tailed classifier by RoutIng Diverse Experts (RIDE). It has three components: 1) a shared architecture for multiple classifiers (experts); 2) a distribution-aware diversity loss that encourages more diverse decisions for classes with fewer training instances; and 3) an expert routing module that dynamically assigns more ambiguous instances to additional experts. With on-par computational complexity, RIDE significantly outperforms the state-of-the-art methods by 5% to 7% on all the benchmarks including CIFAR100-LT, ImageNet-LT and iNaturalist. RIDE is also a universal framework that can be applied to different backbone networks and integrated into various long-tailed algorithms and training mechanisms for consistent performance gains.
翻译:现有识别方法往往侧重于尾品性能增益,往往以增加分类差异导致头性能损失为代价。低尾品性能表现表现表现表现在大型阶级间混乱和高分类差异中。我们的目标是减少RoutIng多样化专家(REIDE)长尾品分解器的偏差和差异。它有三个组成部分:1)多分类人员共同结构(专家);2)分配认知多样性损失,鼓励对培训较少的班级作出更多样化的决定;3)专家分流模块,动态地将更模糊的事例分配给更多的专家。在不同的计算复杂情况下,REIDE在所有基准中,包括CIFAR100-LT、图像网-LT和iNatulist,大大优于最新方法的5%至7%。RIDE也是一个通用框架,可以适用于不同的骨干网络,并纳入各种长期的算法和培训机制,以取得一致的业绩收益。