Sample-efficient generalisation of reinforcement learning approaches have always been a challenge, especially, for complex scenes with many components. In this work, we introduce Plug and Play Markov Decision Processes, an object-based representation that allows zero-shot integration of new objects from known object classes. This is achieved by representing the global transition dynamics as a union of local transition functions, each with respect to one active object in the scene. Transition dynamics from an object class can be pre-learnt and thus would be ready to use in a new environment. Each active object is also endowed with its reward function. Since there is no central reward function, addition or removal of objects can be handled efficiently by only updating the reward functions of objects involved. A new transfer learning mechanism is also proposed to adapt reward function in such cases. Experiments show that our representation can achieve sample-efficiency in a variety of set-ups.
翻译:在这项工作中,我们引入了Plug 和 Play Markov 决策程序,这是一个基于目标的表达方式,允许将已知对象类别中的新对象零发地整合。这可以通过将全球过渡动态作为当地过渡功能的组合来实现,每个功能都代表在现场的一个活跃对象。一个对象类别中的过渡动态可以是前倾斜的,因此可以随时在新的环境中使用。每个活动对象都有其奖赏功能。因为没有中央奖励功能,只能通过更新所涉对象的奖赏功能才能有效处理增加或移除对象。还提议了一个新的转移学习机制,以适应这类情况下的奖赏功能。实验表明,我们的代表性可以在各种组合中实现抽样效率。