Distilling from the feature maps can be fairly effective for dense prediction tasks since both the feature discriminability and localization priors can be well transferred. However, not every pixel contributes equally to the performance, and a good student should learn from what really matters to the teacher. In this paper, we introduce a learnable embedding dubbed receptive token to localize those pixels of interests (PoIs) in the feature map, with a distillation mask generated via pixel-wise attention. Then the distillation will be performed on the mask via pixel-wise reconstruction. In this way, a distillation mask actually indicates a pattern of pixel dependencies within feature maps of teacher. We thus adopt multiple receptive tokens to investigate more sophisticated and informative pixel dependencies to further enhance the distillation. To obtain a group of masks, the receptive tokens are learned via the regular task loss but with teacher fixed, and we also leverage a Dice loss to enrich the diversity of learned masks. Our method dubbed MasKD is simple and practical, and needs no priors of tasks in application. Experiments show that our MasKD can achieve state-of-the-art performance consistently on object detection and semantic segmentation benchmarks. Code is available at: https://github.com/hunto/MasKD .
翻译:从地貌图中蒸馏出来对于密集的预测任务可能相当有效, 因为特性差异性和地方化前缀都可以很好地转换。 但是, 并不是每个像素都能够平等地促进表演, 优秀的学生应该从真正重要的东西中学习。 在本文中, 我们引入了一种可学习的嵌入式的“ 隐含式接受符号 ”, 将特性图中的利益像素( PoIs) 本地化, 并通过像素来引起注意。 然后通过像素智慧重建在面具上进行蒸馏。 这样, 蒸馏面罩实际上表明了教师特写地图中的像素依赖性模式 。 因此, 我们采用多种可接受的象素依赖性符号来调查更精密、 信息性更强的像素依赖性, 以进一步加强蒸馏。 为了获得一组面罩, 接受的符号是通过常规任务损失来学习的, 但是由教师固定的, 我们还利用一个骰子丢失来丰富学习的面具的多样性。 我们的 Maskman 方法既简单又实用, 也不需要在教师的地图图图图中进行前天体检测。 。 。 Mask- dreal- lagitoto lax lax lax lax lax to labs</s>