Sparse R-CNN is a recent strong object detection baseline by set prediction on sparse, learnable proposal boxes and proposal features. In this work, we propose to improve Sparse R-CNN with two dynamic designs. First, Sparse R-CNN adopts a one-to-one label assignment scheme, where the Hungarian algorithm is applied to match only one positive sample for each ground truth. Such one-to-one assignment may not be optimal for the matching between the learned proposal boxes and ground truths. To address this problem, we propose dynamic label assignment (DLA) based on the optimal transport algorithm to assign increasing positive samples in the iterative training stages of Sparse R-CNN. We constrain the matching to be gradually looser in the sequential stages as the later stage produces the refined proposals with improved precision. Second, the learned proposal boxes and features remain fixed for different images in the inference process of Sparse R-CNN. Motivated by dynamic convolution, we propose dynamic proposal generation (DPG) to assemble multiple proposal experts dynamically for providing better initial proposal boxes and features for the consecutive training stages. DPG thereby can derive sample-dependent proposal boxes and features for inference. Experiments demonstrate that our method, named Dynamic Sparse R-CNN, can boost the strong Sparse R-CNN baseline with different backbones for object detection. Particularly, Dynamic Sparse R-CNN reaches the state-of-the-art 47.2% AP on the COCO 2017 validation set, surpassing Sparse R-CNN by 2.2% AP with the same ResNet-50 backbone.
翻译:R-CNN是最近一个强大的物体探测基准,对稀有、可学习的建议框和提议特点进行预测。在这项工作中,我们提议用两种动态设计改进Sprass R-CNN。首先,Sprass R-CNN采用一对一标签分配办法,即匈牙利算法仅用于对每个地面真相进行一个正面抽样。这种一对一分配可能不是匹配学习到的建议框和地面真相的最佳方法。为了解决这一问题,我们提议根据最佳运输算法,在Sprass R-CNN的迭接培训阶段分配越来越多的正样。我们限制配对在后一个阶段逐渐松开。第二,匈牙利算法只用于对每个地面真相进行一个正面抽样。这种一对一分配可能不是匹配所学建议框和地面真理的最佳方法。为了解决这个问题,我们提议动态的多位建议专家动态地组成一个更好的初始建议框和连续培训阶段Sprass RNNN的正标本。我们限制配对在后一个阶段逐渐松动的顺序阶段逐渐松动。DGPG,因此,可以得出基于样品的精度基准的R-R-R-S-S-SARSARSAR标准测试, 的精度测试系统。