Modern object detectors have taken the advantages of backbone networks pre-trained on large scale datasets. Except for the backbone networks, however, other components such as the detector head and the feature pyramid network (FPN) remain trained from scratch, which hinders fully tapping the potential of representation models. In this study, we propose to integrally migrate pre-trained transformer encoder-decoders (imTED) to a detector, constructing a feature extraction path which is ``fully pre-trained" so that detectors' generalization capacity is maximized. The essential differences between imTED with the baseline detector are twofold: (1) migrating the pre-trained transformer decoder to the detector head while removing the randomly initialized FPN from the feature extraction path; and (2) defining a multi-scale feature modulator (MFM) to enhance scale adaptability. Such designs not only reduce randomly initialized parameters significantly but also unify detector training with representation learning intendedly. Experiments on the MS COCO object detection dataset show that imTED consistently outperforms its counterparts by $\sim$2.4 AP. Without bells and whistles, imTED improves the state-of-the-art of few-shot object detection by up to 7.6 AP. Code is available at https://github.com/LiewFeng/imTED.
翻译:现代天体探测器利用了在大规模数据集上预先训练的骨干网络的优势。但是,除了主干网络之外,其他部件,例如探测器头和特征金字塔网络(FPN)仍然从零开始训练,这妨碍了充分挖掘代表性模型的潜力。在本研究中,我们提议将未经训练的变压器编码破解器(IMF)整体迁移到探测器,建造一个“完全经过训练”的特征提取路径,从而最大限度地发挥探测器的普及能力。与基线探测器相比,它们之间的基本差异有两个:(1) 将预先训练的变压器解密器移到探测器头,同时从特征提取路径上随机地去除随机初始化的FPN;(2) 确定一个多尺度的变压器(MFM),以提高规模的适应性。这种设计不仅会随意减少初始化参数,而且会将探测器培训与代表性学习统一起来。在MS COCO物体探测数据集上进行的实验显示,对等器的测试始终比对应器高出$/sim AP.2.4。不用BERD-hests 正在改进AS-GISISAL AS-stations ASUD ASUT ASUT ASUD_totototop ASy