We propose a response-based method of knowledge distillation (KD) for the head pose estimation problem. A student model trained by the proposed KD achieves results better than a teacher model, which is atypical for the response-based method. Our method consists of two stages. In the first stage, we trained the base neural network (NN), which has one regression head and four regression via classification (RvC) heads. We build the convolutional ensemble over the base NN using offsets of face bounding boxes over a regular grid. In the second stage, we perform KD from the convolutional ensemble into the final NN with one RvC head. The KD improves the results by an average of 7.7\% compared to base NN. This feature makes it possible to use KD as a booster and effectively train deeper NNs. NNs trained by our KD method partially improved the state-of-the-art results. KD-ResNet152 has the best results, and KD-ResNet18 has a better result on the AFLW2000 dataset than any previous method.We have made publicly available trained NNs and face bounding boxes for the 300W-LP, AFLW, AFLW2000, and BIWI datasets.Our method potentially can be effective for other regression problems.
翻译:我们为头部建议了一个基于响应的知识蒸馏法(KD),这会产生估计问题。在第二阶段,由拟议的KD培训的学生模型比教师模型取得比教师模型更好的结果,而教师模型对基于响应的方法来说是非典型的。我们的方法分为两个阶段。在第一阶段,我们训练了基础神经网络(NN),这个网络有一个回归头,四个通过分类(RvC)头进行回归。我们用一个固定网格的面部捆绑盒抵消了NNN的组合。我们在最后一个网格中用一个RvC头将KD从演动组合到最后一个NNNN。 KD改进了结果。与基准NNN相比,结果平均为7.7+++++++NNN。这个功能使得可以使用KD作为助推器,有效地培训NNW头部。我们用KD-ResNet152用一个正面框来部分改进了NNFW的结果。 KD-ResNet18在AL2000数据集成的AFW 2000数据中取得了更好的结果。