In this demonstration, we present an efficient BERT-based multi-task (MT) framework that is particularly suitable for iterative and incremental development of the tasks. The proposed framework is based on the idea of partial fine-tuning, i.e. only fine-tune some top layers of BERT while keep the other layers frozen. For each task, we train independently a single-task (ST) model using partial fine-tuning. Then we compress the task-specific layers in each ST model using knowledge distillation. Those compressed ST models are finally merged into one MT model so that the frozen layers of the former are shared across the tasks. We exemplify our approach on eight GLUE tasks, demonstrating that it is able to achieve both strong performance and efficiency. We have implemented our method in the utterance understanding system of XiaoAI, a commercial AI assistant developed by Xiaomi. We estimate that our model reduces the overall serving cost by 86%.
翻译:在这一演示中,我们提出了一个高效的基于BERT的多任务框架,特别适合任务的迭接和渐进发展。拟议框架基于部分微调的设想,即只微调BERT的一些顶层,同时保持其他层的冻结。对于每一项任务,我们使用部分微调独立地培训一个单一任务(ST)模式。然后,我们利用知识蒸馏将每个ST模型中任务特定的层次压缩。这些压缩的ST模型最终合并成一个MT模式,以便前者的冻结层能够在所有任务中共享。我们举例说明了我们对于8项GLUE任务的做法,表明它能够取得强有力的业绩和效率。我们在小米所开发的商业AI(小米的商业助理)的口头理解系统中采用了我们的方法。我们估计,我们的模型将总服务成本降低86%。