Knowledge transfer between artificial neural networks has become an important topic in deep learning. Among the open questions are what kind of knowledge needs to be preserved for the transfer, and how it can be effectively achieved. Several recent work have shown good performance of distillation methods using relation-based knowledge. These algorithms are extremely attractive in that they are based on simple inter-sample similarities. Nevertheless, a proper metric of affinity and use of it in this context is far from well understood. In this paper, by explicitly modularising knowledge distillation into a framework of three components, i.e. affinity, normalisation, and loss, we give a unified treatment of these algorithms as well as study a number of unexplored combinations of the modules. With this framework we perform extensive evaluations of numerous distillation objectives for image classification, and obtain a few useful insights for effective design choices while demonstrating how relation-based knowledge distillation could achieve comparable performance to the state of the art in spite of the simplicity.
翻译:人工神经网络之间的知识转移已成为深层学习的一个重要课题。 开放的问题包括:需要保留何种知识来进行这种转让,以及如何有效地实现这种转让。 最近的一些工作显示,利用基于关系的知识,蒸馏方法表现良好。这些算法具有极强的吸引力,因为它们基于简单的相样相似性。然而,在这种背景下适当测量亲近性并使用这种技术远非人们所熟知。 在本文中,通过将知识蒸馏明确组合成三个组成部分的框架,即亲近性、正常化和损失,我们对这些算法进行统一处理,并研究一些未探索的模块组合。在这个框架内,我们广泛评价了许多用于图像分类的蒸馏目标,并获得一些有用的见解,以有效设计选择。同时展示基于关系的知识蒸馏如何在简单的情况下实现与艺术状态相似的绩效。