Knowledge transfer from a complex high performing model to a simpler and potentially low performing one in order to enhance its performance has been of great interest over the last few years as it finds applications in important problems such as explainable artificial intelligence, model compression, robust model building and learning from small data. Known approaches to this problem (viz. Knowledge Distillation, Model compression, ProfWeight, etc.) typically transfer information directly (i.e. in a single/one hop) from the complex model to the chosen simple model through schemes that modify the target or reweight training examples on which the simple model is trained. In this paper, we propose a meta-approach where we transfer information from the complex model to the simple model by dynamically selecting and/or constructing a sequence of intermediate models of decreasing complexity that are less intricate than the original complex model. Our approach can transfer information between consecutive models in the sequence using any of the previously mentioned approaches as well as work in 1-hop fashion, thus generalizing these approaches. In the experiments on real data, we observe that we get consistent gains for different choices of models over 1-hop, which on average is more than 2\% and reaches up to 8\% in a particular case. We also empirically analyze conditions under which the multi-hop approach is likely to be beneficial over the traditional 1-hop approach, and report other interesting insights. To the best of our knowledge, this is the first work that proposes such a multi-hop approach to perform knowledge transfer given a single high performing complex model, making it in our opinion, an important methodological contribution.
翻译:从复杂的高性能模式向更简单、可能更低的业绩模式转移知识,以提高其业绩,过去几年来,人们一直非常感兴趣,因为它发现一些重要问题的应用,例如可解释的人工智能、模型压缩、稳健模型建设和从小数据学习等重要问题的应用。 这个问题的已知方法(如知识蒸馏、模型压缩、ProfWeight等)通常会将信息从复杂的模式直接(单一/一跃式)转移到选择的简单模式,通过修改简单模型所培训的目标或重量培训范例的计划,我们非常感兴趣。在本文件中,我们提出了一个元办法,通过动态选择和(或)构建一系列与原复杂模式相比不那么复杂、降低复杂性的中间模型,将信息从一个复杂序列中(即知识蒸馏、模型压缩、ProfWeight等)直接传递信息,然后将信息从复杂的模式直接(即单一/一跃式转移)转移到一个模式,从而将这些方法概括化为这些方法。在对真实数据进行的实验中,我们观察到,在不同的模型选择方面会取得一致的收益,在1慢性模型上,通过动态选择,在平均选择中,我们的经验分析中会比1先是更有益的方法,然后是进行一个成功的,然后进行一个成功的研究,然后进行一个成功的,在另一个的实验,然后进行一个成功的研究。