The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. It is also a cornerstone of a future "General AI". Any artificially intelligent agent deployed in a real world application, must adapt on the fly to unknown environments. Researchers often rely on reinforcement and imitation learning to provide online adaptation to new tasks, through trial and error learning. However, this can be challenging for complex tasks which require many timesteps or large numbers of subtasks to complete. These "long horizon" tasks suffer from sample inefficiency and can require extremely long training times before the agent can learn to perform the necessary longterm planning. In this work, we introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals. These subgoals are recalculated at each step using compositional arithmetic in a learned latent representation space. In addition to improving learning efficiency for standard long-term tasks, this approach also makes it possible to perform one-shot generalization to previously unseen tasks, given only a single reference trajectory for the task in a different environment. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.
翻译:在现代机器学习研究中,推广到以前未见且几乎没有监督监督的任务的能力是现代机器学习研究中的一个关键挑战。它也是未来“通用 AI” 的基石。 任何人工智能剂,如果在现实世界应用中部署的人工智能剂,必须适应未知的环境。 研究人员往往依靠强化和模仿学习,通过试验和错误学习,为新的任务提供在线适应。 但是,对于需要许多时间步骤或大量子任务才能完成的复杂任务来说,这可能具有挑战性。 这些“长期地平线”任务受到抽样效率低下的影响,在代理人学会执行必要的长期规划之前可能需要非常长的培训时间。 在这项工作中,我们引入了CASE, 试图通过使用适应性“早”子目标培训一个模拟学习剂来解决这些问题。 这些子目标在每一个步骤上都依靠在已知的潜在代表空间里使用配置算术进行重新计算。除了提高标准长期任务的学习效率外,这一方法还使得有可能对先前所见的任务进行一截面的概括化,因为该任务在不同的环境中只有一个参考轨迹轨迹。 我们的实验显示, 30种方法以持续地体现了先前的ILEment 方法。