We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments. lilGym is based on 2,661 highly-compositional human-written natural language statements grounded in an interactive visual environment. We introduce a new approach for exact reward computation in every possible world state by annotating all statements with executable Python programs. Each statement is paired with multiple start states and reward functions to form thousands of distinct Markov Decision Processes of varying difficulty. We experiment with lilGym with different models and learning regimes. Our results and analysis show that while existing methods are able to achieve non-trivial performance, lilGym forms a challenging open problem. lilGym is available at https://lil.nlp.cornell.edu/lilgym/.
翻译:我们提出LilGym,这是在视觉环境中进行有语言条件的强化学习的新基准。LilGym基于基于互动视觉环境的2,661个高度组合的人类手写自然语言陈述。我们引入了一种在每一个可能的世界州进行精确奖赏计算的新办法,即用可执行的 Python 程序来批注所有声明。每个声明都配以多个起始状态和奖赏功能,以形成数千个不同困难的Markov决断进程。我们用不同的模型和学习机制来试验LilGym。我们的结果和分析表明,虽然现有方法能够实现非边际性表现,但LilGym形成一个挑战性的开放问题。LilGym可在https://lil.np.cornell.edu/liilgym/上查阅。