Despite the recent success of deep neural networks, it remains challenging to effectively model the long-tail class distribution in visual recognition tasks. To address this problem, we first investigate the performance bottleneck of the two-stage learning framework via ablative study. Motivated by our discovery, we propose a unified distribution alignment strategy for long-tail visual recognition. Specifically, we develop an adaptive calibration function that enables us to adjust the classification scores for each data point. We then introduce a generalized re-weight method in the two-stage learning to balance the class prior, which provides a flexible and unified solution to diverse scenarios in visual recognition tasks. We validate our method by extensive experiments on four tasks, including image classification, semantic segmentation, object detection, and instance segmentation. Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework. The code and models will be made publicly available at: https://github.com/Megvii-BaseDetection/DisAlign
翻译:尽管深层神经网络最近取得了成功,但有效模拟视觉识别任务中的长尾类分布,仍然具有挑战性。为了解决这一问题,我们首先调查两阶段学习框架的性能瓶颈,先通过平流研究来解决这个问题。我们发现后,我们提出一个统一的长尾视觉识别分布协调战略。具体地说,我们开发一个适应性校准功能,使我们能够调整每个数据点的分类分数。然后,我们在两阶段学习中引入一个通用的再加权方法,以平衡先前的等级,为视觉识别任务中的不同情景提供一个灵活和统一的解决方案。我们通过在四个任务上的广泛实验来验证我们的方法,包括图像分类、语义分解、对象探测和实例分割。我们的方法通过一个简单和统一的框架在所有四个识别任务中实现最先进的结果。代码和模型将在以下网站公布:https://github.com/Megvii-Basesetryion/DisAlign。