Knowledge distillation is a popular machine learning technique that aims to transfer knowledge from a large 'teacher' network to a smaller 'student' network and improve the student's performance by training it to emulate the teacher. In recent years, there has been significant progress in novel distillation techniques that push performance frontiers across multiple problems and benchmarks. Most of the reported work focuses on achieving state-of-the-art results on the specific problem. However, there has been a significant gap in understanding the process and how it behaves under certain training scenarios. Similarly, transfer learning (TL) is an effective technique in training neural networks on a limited dataset faster by reusing representations learned from a different but related problem. Despite its effectiveness and popularity, there has not been much exploration of knowledge distillation on transfer learning. In this thesis, we propose a machine learning architecture we call TL+KD that combines knowledge distillation with transfer learning; we then present a quantitative and qualitative comparison of TL+KD with TL in the domain of image classification. Through this work, we show that using guidance and knowledge from a larger teacher network during fine-tuning, we can improve the student network to achieve better validation performances like accuracy. We characterize the improvement in the validation performance of the model using a variety of metrics beyond just accuracy scores, and study its performance in scenarios such as input degradation.
翻译:知识蒸馏是一种流行的机器学习技术,其目的是将知识从一个大型的“教师”网络转移到一个较小的“学生”网络,并通过培训学生学习来学习,提高学生的学习成绩,从而提高学生的学习成绩。近年来,在创新的蒸馏技术方面已经取得了显著进展,这些技术使业绩跨越了多种问题和基准。大部分报告的工作侧重于在具体问题上取得最先进的成果。然而,在理解这一过程和在某些培训情景下如何行事方面存在着巨大的差距。同样,转移学习(TL)是培训神经网络的一种有效技术,通过重新使用从不同但相关问题中学到的演示来加快对有限数据集的培训速度。尽管其有效性和受欢迎程度在创新学习方面的知识蒸馏方面没有多大探索。在这个理论中,我们提出了一个机器学习结构,我们称之为TL+KD,将知识蒸馏与转让学习结合起来;我们随后在图像分类领域对TL+KD和TL进行了定量和定性比较。通过这项工作,我们展示了在精确度上使用更大规模的教师网络的指导和知识,在微调的精确度期间,我们可以用改进了业绩的精确度研究,我们可以改进学生网络,改进了业绩的精确度,从而改进了业绩的精确度。