Distilling knowledge from a large teacher model to a lightweight one is a widely successful approach for generating compact, powerful models in the semi-supervised learning setting where a limited amount of labeled data is available. In large-scale applications, however, the teacher tends to provide a large number of incorrect soft-labels that impairs student performance. The sheer size of the teacher additionally constrains the number of soft-labels that can be queried due to prohibitive computational and/or financial costs. The difficulty in achieving simultaneous \emph{efficiency} (i.e., minimizing soft-label queries) and \emph{robustness} (i.e., avoiding student inaccuracies due to incorrect labels) hurts the widespread application of knowledge distillation to many modern tasks. In this paper, we present a parameter-free approach with provable guarantees to query the soft-labels of points that are simultaneously informative and correctly labeled by the teacher. At the core of our work lies a game-theoretic formulation that explicitly considers the inherent trade-off between the informativeness and correctness of input instances. We establish bounds on the expected performance of our approach that hold even in worst-case distillation instances. We present empirical evaluations on popular benchmarks that demonstrate the improved distillation performance enabled by our work relative to that of state-of-the-art active learning and active distillation methods.
翻译:从大型教师模式到轻量级教师模式的知识蒸发是一种在半监督的学习环境中产生紧凑、强力模型的广泛成功方法,在这种环境中,有有限的标签数据。然而,在大规模应用中,教师往往提供大量不正确的软标签,损害学生的成绩。教师的庞大规模还限制了由于令人望而望而生的软标签数量,因为计算成本和(或)财务成本过高,可以查询的软标签数量。在我们工作的核心,实现同时的\emph{效率}(即尽量减少软标签查询)和\emph{robustness}(即避免学生因标签不正确而出现不准确的不准确情况)的困难。在大规模应用中,教师往往会提供大量不利于学生业绩的不正确的软标签。在本论文中,教师的庞大规模还限制了可调试的软标签数量。我们的工作核心在于一种游戏理论的配方,明确考虑到知识性交易的内在贸易差异,甚至由于标签不正确,学生由于标签不正确而导致学生对许多现代任务进行蒸馏。我们用的经验评估的改进了我们目前的工作表现。我们用最精确的进度,从而证明我们目前能够进行最精确的工作表现。