Recent one-stage object detectors follow a per-pixel prediction approach that predicts both the object category scores and boundary positions from every single grid location. However, the most suitable positions for inferring different targets, i.e., the object category and boundaries, are generally different. Predicting all these targets from the same grid location thus may lead to sub-optimal results. In this paper, we analyze the suitable inference positions for object category and boundaries, and propose a prediction-target-decoupled detector named PDNet to establish a more flexible detection paradigm. Our PDNet with the prediction decoupling mechanism encodes different targets separately in different locations. A learnable prediction collection module is devised with two sets of dynamic points, i.e., dynamic boundary points and semantic points, to collect and aggregate the predictions from the favorable regions for localization and classification. We adopt a two-step strategy to learn these dynamic point positions, where the prior positions are estimated for different targets first, and the network further predicts residual offsets to the positions with better perceptions of the object properties. Extensive experiments on the MS COCO benchmark demonstrate the effectiveness and efficiency of our method. With a single ResNeXt-64x4d-101-DCN as the backbone, our detector achieves 50.1 AP with single-scale testing, which outperforms the state-of-the-art methods by an appreciable margin under the same experimental settings.Moreover, our detector is highly efficient as a one-stage framework. Our code is public at https://github.com/yangli18/PDNet.
翻译:最近一个阶段的物体探测器采用了每像素预测方法,预测每个网格位置的物体类别分数和边界位置。 但是,用来推断不同目标的最合适位置, 即物体类别和边界, 总的来说是不同的。 从同一个网格位置预测所有这些目标, 可能导致低于最佳结果。 在本文件中, 我们分析物体类别和边界的适当推论位置, 并提议一个名为 PDNet 的预测- 目标分解探测器, 以建立一个更灵活的探测模式。 我们的PDNet, 加上预测分解机制, 将不同地点的不同目标分别编码。 设计了一个最合适的位置, 用来推断不同的目标, 即对象类别类别和边界。 从相同的网格点, 收集和汇总来自有利区域的预测, 以便进行本地化和分类。 我们的S- 目标分位点/ 网络 进一步预测, 通过更好的对物体特性的辨识析, 将一个可学习的预测的预测模块 。