Rotated bounding boxes drastically reduce output ambiguity of elongated objects, making it superior to axis-aligned bounding boxes. Despite the effectiveness, rotated detectors are not widely employed. Annotating rotated bounding boxes is such a laborious process that they are not provided in many detection datasets where axis-aligned annotations are used instead. In this paper, we propose a framework that allows the model to predict precise rotated boxes only requiring cheaper axis-aligned annotation of the target dataset 1. To achieve this, we leverage the fact that neural networks are capable of learning richer representation of the target domain than what is utilized by the task. The under-utilized representation can be exploited to address a more detailed task. Our framework combines task knowledge of an out-of-domain source dataset with stronger annotation and domain knowledge of the target dataset with weaker annotation. A novel assignment process and projection loss are used to enable the co-training on the source and target datasets. As a result, the model is able to solve the more detailed task in the target domain, without additional computation overhead during inference. We extensively evaluate the method on various target datasets including fresh-produce dataset, HRSC2016 and SSDD. Results show that the proposed method consistently performs on par with the fully supervised approach.
翻译:旋转边界框显著降低了细长物体的输出模糊度,相比于轴对齐边界框具有更高的优越性。尽管旋转检测器非常有效,但并未得到广泛应用。标注旋转边界框是一项繁琐的工作,因此许多检测数据集中并未提供旋转标注,而是使用轴对齐标注。本文提出了一种框架,允许模型仅使用更便宜的轴对齐标注就能够预测精确的旋转框。为了实现这一点,我们利用了神经网络能够学习目标领域比任务中利用更丰富的表示这个事实。未充分利用的表示可以用来解决更详细的任务。我们的框架将领域知识和更弱的标注与来自 out-of-domain 的源数据集的强任务知识相结合。通过一种新颖的分配过程和投影损失,使源数据集和目标数据集同时进行联合训练。结果,该模型能够在目标领域解决更详细的任务,而无需在推理期间增加计算开销。我们在各种目标数据集上进行了广泛评估,包括新鲜农产品数据集、HRSC2016 和 SSDD。结果表明,所提出的方法一致表现不亚于完全监督的方法。