Decomposing knowledge into interchangeable pieces promises a generalization advantage when there are changes in distribution. A learning agent interacting with its environment is likely to be faced with situations requiring novel combinations of existing pieces of knowledge. We hypothesize that such a decomposition of knowledge is particularly relevant for being able to generalize in a systematic manner to out-of-distribution changes. To study these ideas, we propose a particular training framework in which we assume that the pieces of knowledge an agent needs and its reward function are stationary and can be re-used across tasks. An attention mechanism dynamically selects which modules can be adapted to the current task, and the parameters of the selected modules are allowed to change quickly as the learner is confronted with variations in what it experiences, while the parameters of the attention mechanisms act as stable, slowly changing, meta-parameters. We focus on pieces of knowledge captured by an ensemble of modules sparsely communicating with each other via a bottleneck of attention. We find that meta-learning the modular aspects of the proposed system greatly helps in achieving faster adaptation in a reinforcement learning setup involving navigation in a partially observed grid world with image-level input. We also find that reversing the role of parameters and meta-parameters does not work nearly as well, suggesting a particular role for fast adaptation of the dynamically selected modules.
翻译:将知识分解成可互换的片段,在分配变化时,将知识分解为可互换的片段,将带来一个普遍性的优势。一个与其环境互动的学习代理机构可能面临需要现有知识的新型组合的情况。我们假设,这种知识分解对于能够系统化地将知识分解为超出分配的变化特别相关。为了研究这些想法,我们提议了一个特殊的培训框架,在这个框架内,我们假设一个代理机构需要的知识部分及其奖励功能是固定的,可以在各个任务中重新使用。一个关注机制动态地选择模块可以适应当前任务,而选定的模块的参数可以随着学习者面对其经历的变异而迅速改变,而关注机制的参数则是稳定、缓慢地变化和元参数。我们侧重于一个模块的组合所捕捉到的一些知识,这些模块通过一个瓶颈分散地相互沟通,我们发现,拟议的系统的模块方面能够极大地帮助实现更快的适应,在部分观测到的电网域的导航中,而选定的模块的参数可以迅速变化。我们发现,一个特定的动态模块的作用是快速的。我们发现,可以找到一个特定的动态输入的模型的作用。