In lifelong learning, the tasks (or classes) to be learned arrive sequentially over time in arbitrary order. During training, knowledge from previous tasks can be captured and transferred to subsequent ones to improve sample efficiency. We consider the setting where all target tasks can be represented in the span of a small number of unknown linear or nonlinear features of the input data. We propose a provable lifelong learning algorithm that maintains and refines the internal feature representation. We prove that for any desired accuracy on all tasks, the dimension of the representation remains close to that of the underlying representation. The resulting sample complexity improves significantly on existing bounds. In the setting of linear features, our algorithm is provably efficient and the sample complexity for input dimension $d$, $m$ tasks with $k$ features up to error $\epsilon$ is $\tilde{O}(dk^{1.5}/\epsilon+km/\epsilon)$. We also prove a matching lower bound for any lifelong learning algorithm that uses a single task learner as a black box. Finally, we complement our analysis with an empirical study.
翻译:在终身学习中,需要学习的任务(或班级)会随时间的任意顺序相继到达。在培训期间,以往任务的知识可以捕捉,并转让给后来的任务,以提高样本效率。我们考虑所有目标任务都可以在输入数据的少量未知线性或非线性特征中体现的设置。我们建议一种可实现的终身学习算法,以维持和完善内部特征代表。我们证明,对于所有任务的任何预期准确性来说,代表的维度仍然接近于基本代表的准确度。由此产生的抽样复杂性大大改进了现有界限。在设置线性特征时,我们的算法非常有效,投入层面的抽样复杂度为$($ $), $( $\ epslon$ ) 最多误差为$\ tilde{O} (dk ⁇ 1.5}/\ epsilon+km/\ epsilon) 。我们还证明,任何使用单一任务学习者作为黑盒的终身学习算法的比起来要低。最后,我们用实验性研究补充我们的分析。