Self-training achieves enormous success in various semi-supervised and weakly-supervised learning tasks. The method can be interpreted as a teacher-student framework, where the teacher generates pseudo-labels, and the student makes predictions. The two models are updated alternatingly. However, such a straightforward alternating update rule leads to training instability. This is because a small change in the teacher may result in a significant change in the student. To address this issue, we propose {\ours}, short for differentiable self-training, that treats teacher-student as a Stackelberg game. In this game, a leader is always in a more advantageous position than a follower. In self-training, the student contributes to the prediction performance, and the teacher controls the training process by generating pseudo-labels. Therefore, we treat the student as the leader and the teacher as the follower. The leader procures its advantage by acknowledging the follower's strategy, which involves differentiable pseudo-labels and differentiable sample weights. Consequently, the leader-follower interaction can be effectively captured via Stackelberg gradient, obtained by differentiating the follower's strategy. Experimental results on semi- and weakly-supervised classification and named entity recognition tasks show that our model outperforms existing approaches by large margins.
翻译:自我培训在各种半监督和薄弱监督的学习任务中取得了巨大成功。 这种方法可以被解释为教师- 学生框架, 教师产生假标签, 学生作出预测。 两种模式是交替更新的。 但是, 这种直接的交替更新规则会导致培训不稳定。 这是因为教师的小规模变化可能导致学生的重大变化。 为了解决这个问题, 我们提议将教师- 学生作为不同的自我培训的简称, 将师- 学生当作一种Stackelberg游戏。 在这个游戏中, 领导者总是处于比追随者更有利的位置。 在自我培训中, 学生对预测性能作出贡献, 教师通过生成假标签来控制培训过程。 因此, 我们把学生作为领导和教师作为学生的追随者。 领导人通过承认后续者的战略( 包括不同的假标签和不同的样本重量 ) 。 因此, 领导者- 执行者的互动可以通过Stackelberg 的大型实体升级和升级后演化方法, 来有效地捕捉摸我们的软性实体升级和升级后演算。