The ability to form complex plans based on raw visual input is a litmus test for current capabilities of artificial intelligence, as it requires a seamless combination of visual processing and abstract algorithmic execution, two traditionally separate areas of computer science. A recent surge of interest in this field brought advances that yield good performance in tasks ranging from arcade games to continuous control; these methods however do not come without significant issues, such as limited generalization capabilities and difficulties when dealing with combinatorially hard planning instances. Our contribution is two-fold: (i) we present a method that learns to represent its environment as a latent graph and leverages state reidentification to reduce the complexity of finding a good policy from exponential to linear (ii) we introduce a set of lightweight environments with an underlying discrete combinatorial structure in which planning is challenging even for humans. Moreover, we show that our methods achieves strong empirical generalization to variations in the environment, even across highly disadvantaged regimes, such as "one-shot" planning, or in an offline RL paradigm which only provides low-quality trajectories.
翻译:以原始视觉输入为基础制定复杂计划的能力是当前人工智能能力的试金石,因为它需要视觉处理和抽象算法执行的无缝结合,这两个传统上是计算机科学的两个不同领域。最近对该领域的兴趣激增带来了进步,在从街机游戏到连续控制等任务方面产生了良好的表现;然而,这些方法并非没有重大问题,例如,一般化能力有限,在处理组合式硬性规划实例时存在困难。我们的贡献有两个方面:(一)我们提出一种方法,学会将其环境作为潜影图解和杠杆状态重新定位,以降低从指数到线性的良好政策的复杂性;(二)我们推出一套轻型环境,其基本离散组合结构对人来说也是挑战的。此外,我们表明,我们的方法在环境变化方面,甚至在处于高度劣势的制度中,例如“一发式”规划,或只提供低质量轨迹的离线外模式中,都实现了强烈的经验概括。