Lifelong language learning seeks to have models continuously learn multiple tasks in a sequential order without suffering from catastrophic forgetting. State-of-the-art approaches rely on sparse experience replay as the primary approach to prevent forgetting. Experience replay usually adopts sampling methods for the memory population; however, the effect of the chosen sampling strategy on model performance has not yet been studied. In this paper, we investigate how relevant the selective memory population is in the lifelong learning process of text classification and question-answering tasks. We found that methods that randomly store a uniform number of samples from the entire data stream lead to high performances, especially for low memory size, which is consistent with computer vision studies.
翻译:终身语言学习旨在让模型连续不断地学习多重任务,而不会遭受灾难性的遗忘。最先进的方法依靠稀少的经验回放作为防止遗忘的主要方法。经验回放通常对记忆群采用抽样方法;然而,尚未研究选定抽样战略对模型性能的影响。在本文中,我们调查选择性记忆群在文本分类和问答任务的终生学习过程中的关联性。我们发现,随机储存整个数据流中相同数量的样本的方法导致高性能,特别是低记忆体积,这与计算机的视觉研究是一致的。