Classification and localization are two main sub-tasks in object detection. Nonetheless, these two tasks have inconsistent preferences for feature context, i.e., localization expects more boundary-aware features to accurately regress the bounding box, while more semantic context is preferred for object classification. Exsiting methods usually leverage disentangled heads to learn different feature context for each task. However, the heads are still applied on the same input features, which leads to an imperfect balance between classifcation and localization. In this work, we propose a novel Task-Specific COntext DEcoupling (TSCODE) head which further disentangles the feature encoding for two tasks. For classification, we generate spatially-coarse but semantically-strong feature encoding. For localization, we provide high-resolution feature map containing more edge information to better regress object boundaries. TSCODE is plug-and-play and can be easily incorperated into existing detection pipelines. Extensive experiments demonstrate that our method stably improves different detectors by over 1.0 AP with less computational cost. Our code and models will be publicly released.
翻译:分类和本地化是物体探测中的两个主要子任务。 然而,这两个任务对地物环境的偏好并不一致, 也就是说, 本地化预期更多的边界认知特性可以精确地递减捆绑框, 而对于对象分类则倾向于更多的语义环境。 推介方法通常会利用分解头来了解每个任务的不同特征背景。 但是, 头部仍然应用在相同的输入特性上, 从而导致分类和本地化之间的平衡。 在这项工作中, 我们提议了一个新的任务特异性COntext Deccoupuling (TSCODE) 头, 它将进一步分解两个任务的特性编码。 关于分类, 我们生成空间分解但语义强的特性编码。 对于本地化, 我们提供高分辨率特征图, 包含更多的边际信息, 以更好地回归对象边界。 TSCODE 是插和游戏, 并且很容易被凝聚到现有的探测管道中。 广泛的实验表明, 我们的方法会以较低的计算成本, 将公开发布我们的代码和模型。</s>