In several supervised learning scenarios, auxiliary losses are used in order to introduce additional information or constraints into the supervised learning objective. For instance, knowledge distillation aims to mimic outputs of a powerful teacher model; similarly, in rule-based approaches, weak labeling information is provided by labeling functions which may be noisy rule-based approximations to true labels. We tackle the problem of learning to combine these losses in a principled manner. Our proposal, AMAL, uses a bi-level optimization criterion on validation data to learn optimal mixing weights, at an instance level, over the training data. We describe a meta-learning approach towards solving this bi-level objective and show how it can be applied to different scenarios in supervised learning. Experiments in a number of knowledge distillation and rule-denoising domains show that AMAL provides noticeable gains over competitive baselines in those domains. We empirically analyze our method and share insights into the mechanisms through which it provides performance gains.
翻译:在若干受监督的学习情景中,利用辅助性损失来为受监督的学习目标引入额外信息或制约。例如,知识蒸馏旨在模仿强大的教师模式的产出;同样,在基于规则的方法中,标签功能提供薄弱的标签信息,这些功能可能是吵闹的基于规则的近似接近真实标签。我们处理学习以原则方式将这些损失结合起来的问题。我们的提案,AMAL,在验证数据上采用双级优化标准,在实例一级学习最佳混合权重,而不是培训数据。我们描述了解决这一双级目标的元化学习方法,并展示了如何在受监督的学习中将其应用于不同情景。在一些知识蒸馏和规则破坏领域的实验表明,AMAL在这些领域的竞争基线上提供了显著的收益。我们从经验上分析我们的方法,并分享对它提供业绩收益的机制的洞察力。