Online knowledge distillation (KD) has received increasing attention in recent years. However, while most existing online KD methods focus on developing complicated model structures and training strategies to improve the distillation of high-level knowledge like probability distribution, the effects of the multi-level knowledge in the online KD are greatly overlooked, especially the low-level knowledge. Thus, to provide a novel viewpoint to online KD, we propose MetaMixer, a regularization strategy that can strengthen the distillation by combining the low-level knowledge that impacts the localization capability of the networks, and high-level knowledge that focuses on the whole image. Experiments under different conditions show that MetaMixer can achieve significant performance gains over state-of-the-art methods.
翻译:近些年来,在线知识蒸馏(KD)受到越来越多的关注,然而,尽管大多数现有的在线知识蒸馏(KD)方法侧重于开发复杂的模型结构和培训战略,以改善诸如概率分布等高级知识的蒸馏,但在线知识中多层次知识的效果,特别是低层次知识,却被大大忽视了。 因此,为了给在线知识蒸馏(KD)提供新颖的观点,我们提出了MetaMixer(MetaMixer)(MetaMixer)(MetaMixer)(MetaMixer)(MetaMixer))(MetaMixer)(MetaMixer)(Mixer)(MetaMixer)(Mixer)(MetaMixer)(Mixer)(MetaMixer)(Mixer)(Meta Mixer)(Mixer)(Meta)(Mixer)(Mixer)(Nate of the state</s>