Although deep convolution neural networks (DCNN) have achieved excellent performance in human pose estimation, these networks often have a large number of parameters and computations, leading to the slow inference speed. For this issue, an effective solution is knowledge distillation, which transfers knowledge from a large pre-trained network (teacher) to a small network (student). However, there are some defects in the existing approaches: (I) Only a single teacher is adopted, neglecting the potential that a student can learn from multiple teachers. (II) The human segmentation mask can be regarded as additional prior information to restrict the location of keypoints, which is never utilized. (III) A student with a small number of parameters cannot fully imitate heatmaps provided by datasets and teachers. (IV) There exists noise in heatmaps generated by teachers, which causes model degradation. To overcome these defects, we propose an orderly dual-teacher knowledge distillation (ODKD) framework, which consists of two teachers with different capabilities. Specifically, the weaker one (primary teacher, PT) is used to teach keypoints information, the stronger one (senior teacher, ST) is utilized to transfer segmentation and keypoints information by adding the human segmentation mask. Taking dual-teacher together, an orderly learning strategy is proposed to promote knowledge absorbability. Moreover, we employ a binarization operation which further improves the learning ability of the student and reduces noise in heatmaps. Experimental results on COCO and OCHuman keypoints datasets show that our proposed ODKD can improve the performance of different lightweight models by a large margin, and HRNet-W16 equipped with ODKD achieves state-of-the-art performance for lightweight human pose estimation.
翻译:虽然深相神经网络(DCNN)在人造表面估计方面表现良好,但这些网络往往具有大量参数和计算方法,导致低推速速度。对于这一问题,有效的解决方案是知识蒸馏,将知识从一个大型的预先培训的网络(教师)转移到一个小型网络(学生),但现有方法中存在一些缺陷:(一) 只有一个教师被采纳,忽视了学生可以从多个教师那里学习的潜力。 (二) 人类分解面罩可以被视为以前限制关键点位置(从未使用过)的额外信息。 (三) 具有少量参数的学生无法完全模仿数据集和教师提供的热图。 (四) 教师产生的热图中存在噪音,导致模型退化。为了克服这些缺陷,我们建议采用一个有序的双教师知识蒸馏(ODKDD)框架,由两个具有不同能力的教师组成。 具体地说,一个更弱的(小学教师,PT)用于教授关键点(OD)的亮度信息定位,一个更强的操作能力,一个通过学习阶段(高级教师) 将数据转换成一个系统。