Knowledge distillation (KD) is a successful approach for deep neural network acceleration, with which a compact network (student) is trained by mimicking the softmax output of a pre-trained high-capacity network (teacher). In tradition, KD usually relies on access to the training samples and the parameters of the white-box teacher to acquire the transferred knowledge. However, these prerequisites are not always realistic due to storage costs or privacy issues in real-world applications. Here we propose the concept of decision-based black-box (DB3) knowledge distillation, with which the student is trained by distilling the knowledge from a black-box teacher (parameters are not accessible) that only returns classes rather than softmax outputs. We start with the scenario when the training set is accessible. We represent a sample's robustness against other classes by computing its distances to the teacher's decision boundaries and use it to construct the soft label for each training sample. After that, the student can be trained via standard KD. We then extend this approach to a more challenging scenario in which even accessing the training data is not feasible. We propose to generate pseudo samples distinguished by the teacher's decision boundaries to the largest extent and construct soft labels for them, which are used as the transfer set. We evaluate our approaches on various benchmark networks and datasets and experiment results demonstrate their effectiveness. Codes are available at: https://github.com/zwang84/zsdb3kd.
翻译:知识蒸馏( KD) 是深心神经网络加速的成功方法, 通过模拟受过训练的高能力网络( 教师) 的软模输出来训练一个紧凑网络( 学生) 。 传统上, KD通常依靠获得培训样本和白箱教师的参数来获取转让的知识。 然而, 这些先决条件并不总是现实的, 因为在现实世界应用中存储成本或隐私问题。 我们在这里提出了基于决定的黑盒( DB3) 知识蒸馏的概念。 我们在此提出一个基于决定的黑盒( DB3) 知识蒸馏( 学生) 的概念, 通过从黑盒教师那里提取知识来培训学生( 参数不易获得), 来学习只返回课程而不是软模输出( 教师) 。 我们从培训成套的情景开始。 我们通过计算其距离教师决定界限的距离, 并使用它来构建每个培训样本。 之后, 学生可以通过标准 KDD 接受培训。 我们然后将这一方法推广到一个更具有挑战性的设想中, 即使访问培训数据是行不通的( 参数是无法获取的 ) 。 我们提议从软体样标定的模型来进行测试。 。 。 我们用模型来区分 。 。 我们用各种的路径 的路径 的路径来测量 。 。 的路径是用来用来测量的路径 。 。 。 的路径是用来测量的路径 。