Building sample-efficient agents that generalize out-of-distribution (OOD) in real-world settings remains a fundamental unsolved problem on the path towards achieving higher-level cognition. One particularly promising approach is to begin with low-dimensional, pretrained representations of our world, which should facilitate efficient downstream learning and generalization. By training 240 representations and over 10,000 reinforcement learning policies on a simulated robotic setup, we evaluate to what extent different properties of pretrained VAE-based representations affect the OOD generalization of downstream agents. We observe that many agents are surprisingly robust to realistic distribution shifts, including the challenging sim-to-real case. In addition, we find that the generalization performance of a simple downstream proxy task reliably predicts the generalization performance of our reinforcement learning control tasks under a wide range of practically relevant OOD settings. Such proxy tasks can thus be used to select pretrained representations that will lead to agents that generalize out-of-distribution.
翻译:建筑在现实环境中普遍分配(OOD)的抽样高效剂,在现实环境中普遍分配(OOD)仍然是实现更高水平认知的道路上一个根本的尚未解决的问题。一种特别有希望的方法是,从低维的、预先培训的对世界的展示开始,这应当有利于高效率的下游学习和概括化。通过培训240个演示和关于模拟机器人设置的10 000多个强化学习政策,我们评估了预先培训VAE的演示的不同特性在多大程度上影响到OOOD对下游制剂的概括化。我们发现,许多代理人对现实的分布变化,包括具有挑战性的模拟到现实的个案,具有惊人的活力。此外,我们发现,简单下游代理任务的一般化表现可靠地预测了我们在与OOD相关的广泛环境中加强学习控制任务的一般性表现。因此,这种代理任务可用于选择预先培训的演示,从而导致代理人普遍分配。