Representing features at multiple scales is of great importance for numerous vision tasks. Recent advances in backbone convolutional neural networks (CNNs) continually demonstrate stronger multi-scale representation ability, leading to consistent performance gains on a wide range of applications. However, most existing methods represent the multi-scale features in a layer-wise manner. In this paper, we propose a novel building block for CNNs, namely Res2Net, by constructing hierarchical residual-like connections within one single residual block. The Res2Net represents multi-scale features at a granular level and increases the range of receptive fields for each network layer. The proposed Res2Net block can be plugged into the state-of-the-art backbone CNN models, e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these models and demonstrate consistent performance gains over baseline models on widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies and experimental results on representative computer vision tasks, i.e., object detection, class activation mapping, and salient object detection, further verify the superiority of the Res2Net over the state-of-the-art baseline methods. The source code and trained models are available on https://mmcheng.net/res2net/.
翻译:在多个规模上代表特征对于许多愿景任务非常重要。骨干神经神经神经网络(CNNs)的近期进步持续显示出更强的多规模代表能力,导致在广泛的应用中取得一致的绩效收益。然而,大多数现有方法以层次方式代表多尺度特征。在本文件中,我们提议为CNN提供一个新的构件,即Res2Net,方法是在一个单一的剩余区块内建立等级级残余类似连接。Res2Net代表颗粒级的多规模特征,并增加每个网络层的可容纳域的范围。拟议的Res2Net块可以插入CNN最先进的骨干模型中,例如ResNet、ResNeXt和DLA。我们评估所有这些模型中的Res2Net块,并展示广泛使用的数据集基线模型(例如CIFAR-100和图像网)的一贯性绩效收益。在具有代表性的计算机视觉任务方面,例如物体探测、级激活绘图和突出的物体探测。进一步核查Res2Net基准/基准模型的优越性。