A hallmark of human intelligence is the ability to construct self-contained chunks of knowledge and reuse them in novel combinations for solving different problems. Learning such compositional structures has been a challenge for artificial systems, due to the underlying combinatorial search. To date, research into compositional learning has largely proceeded separately from work on lifelong or continual learning. This dissertation integrated these two lines of work to present a general-purpose framework for lifelong learning of functionally compositional structures. The framework separates the learning into two stages: learning how to combine existing components to assimilate a novel problem, and learning how to adapt the existing components to accommodate the new problem. This separation explicitly handles the trade-off between stability and flexibility. This dissertation instantiated the framework into various supervised and reinforcement learning (RL) algorithms. Supervised learning evaluations found that 1) compositional models improve lifelong learning of diverse tasks, 2) the multi-stage process permits lifelong learning of compositional knowledge, and 3) the components learned by the framework represent self-contained and reusable functions. Similar RL evaluations demonstrated that 1) algorithms under the framework accelerate the discovery of high-performing policies, and 2) these algorithms retain or improve performance on previously learned tasks. The dissertation extended one lifelong compositional RL algorithm to the nonstationary setting, where the task distribution varies over time, and found that modularity permits individually tracking changes to different elements in the environment. The final contribution of this dissertation was a new benchmark for compositional RL, which exposed that existing methods struggle to discover the compositional properties of the environment.
翻译:人类情报的一个标志是能够构建自足的知识群,再用新的组合来重新利用这些知识群,以解决不同的问题。学习这种组成结构对人工系统是一个挑战,因为基本的组合搜索。迄今为止,对组成学习的研究基本上与终身或持续学习的工作分开进行。这一论文结合了这两条工作线,为终生学习功能构成结构提供了一个通用框架。该框架将学习分为两个阶段:学习如何将现有组成部分结合起来,吸收一个新问题,学习如何调整现有组成部分以适应新的问题。这种分离明确处理稳定性和灵活性之间的权衡。这种分解将框架立即纳入各种监督和强化学习(RL)的算法。 高级学习评价发现:(1) 组成模式改善了各种任务终身学习的终身学习,(2) 多阶段进程允许终身学习组成知识,(3) 框架所学的组成部分是自成一体和可再利用的披露功能。 类似的RL评估表明:(1) 框架下的各种算法加速了在稳定性和灵活性之间的取舍取利。