Continual learning refers to a dynamical framework in which a model or agent receives a stream of non-stationary data over time and must adapt to new data while preserving previously acquired knowledge. Unfortunately, deep neural networks fail to meet these two desiderata, incurring the so-called catastrophic forgetting phenomenon. Whereas a vast array of strategies have been proposed to attenuate forgetting in the computer vision domain, for speech-related tasks, on the other hand, there is a dearth of works. In this paper, we turn our attention toward the joint use of rehearsal and knowledge distillation (KD) approaches for spoken language understanding under a class-incremental learning scenario. We report on multiple KD combinations at different levels in the network, showing that combining feature-level and predictions-level KDs leads to the best results. Finally, we provide an ablation study on the effect of the size of the rehearsal memory that corroborates the appropriateness of our approach for low-resource devices.
翻译:持续学习是指一个动态框架,一个模型或代理器在一段时间内获得一系列非静止数据,并且必须适应新数据,同时保留先前获得的知识。 不幸的是,深神经网络未能满足这两个条件,导致所谓的灾难性遗忘现象。虽然已经提出了一系列广泛的战略,在计算机视野领域减少遗忘,在与语言有关的任务方面,却缺少工作。在本文中,我们把注意力转向在课堂学习情景下共同使用排练和知识蒸馏(KD)方法来理解口头语言。我们报告网络不同层次的多种KD组合,表明将地貌水平和预测水平KD结合起来可以取得最佳结果。最后,我们提供了关于排练记忆大小影响的研究,以证实我们对低资源设备的方法是否适当。