CBNetV2:用于探测物体的复合后骨网络架构 (CBNetV2: A Composite Backbone Network Architecture for Object Detection)

Modern top-performing object detectors depend heavily on backbone networks, whose advances bring consistent performance gains through exploring more effective network structures. However, designing or searching for a new backbone and pre-training it on ImageNet may require a large number of computational resources, making it costly to obtain better detection performance. In this paper, we propose a novel backbone network, namely CBNetV2, by constructing compositions of existing open-sourced pre-trained backbones. In particular, CBNetV2 architecture groups multiple identical backbones, which are connected through composite connections. We also propose a better training strategy with the Assistant Supervision for CBNet-based detectors. Without additional pre-training, CBNetV2 can be integrated into mainstream detectors, including one-stage and two-stage detectors, as well as anchor-based and anchor-free-based ones, and significantly improve their performance by more than 3.0% AP over the baseline on COCO. Also, experiments provide strong evidence showing that composite backbones are more efficient and resource-friendly than pre-trained wider and deeper networks, including manual-based and NAS-based, as well as CNN-based and Transformer-based ones. Particularly, with single-model and single-scale testing, our HTC Dual-Swin-B achieves 58.6% box AP and 51.1% mask AP on COCO test-dev, which is significantly better than the state-of-the-art result (i.e., 57.7% box AP and 50.2% mask AP) achieved by a stronger baseline HTC++ with a larger backbone Swin-L. Code will be released at https://github.com/VDIGPKU/CBNetV2.

翻译：现代顶级性能天体探测器严重依赖主干网,其进步通过探索更有效的网络结构带来一致的绩效收益。然而,设计或寻找新的主干网并在图像网络上进行预培训可能需要大量计算资源,因此要获得更好的检测性能成本很高。在本文中,我们提议建立一个新型的主干网,即CBNetV2, 即CBNetV2, 构建现有开放源的预培训性骨的构成;特别是, CBNetV2 架构组多个相同的主干网,这些主干网通过复合连接连接连接,可以带来一致的绩效收益收益。我们还提议与CBNBNet2 助理监督员一道制定更好的培训战略。没有额外的培训前,CBNet2 就可以将新的骨干网纳入主流探测器,包括一阶段和两阶段的探测器,以及基于固定基地和固定基地的无固定基地值的网络,并在COCO的基线基准线上大大改进它们的性能,超过3.0%;此外,实验提供了有力的证据表明,复合骨干网比预先训练的更广泛和深层次的网络,包括基于手基的和NAS基础的网络,以及GIS-基础和变式的基线的HA-C-C-C-C-C-C-C-B-C-B-C-B-B-B-B-B-C-C-B-B-B-B-B-C-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-C-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-B-