A fundamental trait of intelligence is the ability to achieve goals in the face of novel circumstances, such as making decisions from new action choices. However, standard reinforcement learning assumes a fixed set of actions and requires expensive retraining when given a new action set. To make learning agents more adaptable, we introduce the problem of zero-shot generalization to new actions. We propose a two-stage framework where the agent first infers action representations from action information acquired separately from the task. A policy flexible to varying action sets is then trained with generalization objectives. We benchmark generalization on sequential tasks, such as selecting from an unseen tool-set to solve physical reasoning puzzles and stacking towers with novel 3D shapes. Videos and code are available at https://sites.google.com/view/action-generalization
翻译:情报的一个根本特征是,面对新情况,例如根据新的行动选择作出决定,有能力实现目标,例如根据新的行动选择作出决定。然而,标准强化学习采取一套固定的行动,在给予一套新行动时需要花费昂贵的再培训。为了使学习机构更能适应性更高,我们向新的行动提出零射概括化问题。我们提议一个两阶段框架,使代理机构首先从与任务分开获得的行动信息中推断行动说明。然后,对不同行动组的灵活政策进行概括化目标培训。我们以相继任务为基准,例如从一个看不见的工具集中选择解决物理推理谜题,用新的3D形状堆叠塔。视频和代码可在https://sites.gogle.com/view/action-gencalization上查阅。