Building sample-efficient agents that generalize out-of-distribution (OOD) in real-world settings remains a fundamental unsolved problem on the path towards achieving higher-level cognition. One particularly promising approach is to begin with low-dimensional, pretrained representations of our world, which should facilitate efficient downstream learning and generalization. By training 240 representations and over 10,000 reinforcement learning (RL) policies on a simulated robotic setup, we evaluate to what extent different properties of pretrained VAE-based representations affect the OOD generalization of downstream agents. We observe that many agents are surprisingly robust to realistic distribution shifts, including the challenging sim-to-real case. In addition, we find that the generalization performance of a simple downstream proxy task reliably predicts the generalization performance of our RL agents under a wide range of OOD settings. Such proxy tasks can thus be used to select pretrained representations that will lead to agents that generalize.
翻译:建筑样本效率高的代理物,在现实环境中普遍分布(OOD)仍然是实现更高水平认知的道路上一个根本的尚未解决的问题。一种特别有希望的方法是,从我们世界的低维、预先培训的表述物开始,这应当有利于高效的下游学习和概括化。通过培训240个代表物和1万多个模拟机器人设置强化学习(RL)政策,我们评估了预先培训的VAE代表物的不同性质在多大程度上影响了OOOD对下游代理物的概括化。我们发现,许多代理物对现实的分布变化,包括具有挑战性的模拟到现实的情况,具有惊人的强大性。此外,我们发现一个简单的下游代理物的概括性表现可靠地预测了我们在广泛的OOD环境中的RL代理物的概括性表现。因此,这种代理物可以用来选择预先培训的表述物,从而导致代理人普遍化。