We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments. lilGym is based on 2,661 highly-compositional human-written natural language statements grounded in an interactive visual environment. We annotate all statements with executable Python programs representing their meaning to enable exact reward computation in every possible world state. Each statement is paired with multiple start states and reward functions to form thousands of distinct Markov Decision Processes of varying difficulty. We experiment with lilGym with different models and learning regimes. Our results and analysis show that while existing methods are able to achieve non-trivial performance, lilGym forms a challenging open problem. lilGym is available at https://lil.nlp.cornell.edu/lilgym/.
翻译:我们提出LilGym, 这是在视觉环境中以语言为条件的强化学习的新基准。 lilGym 是基于基于互动视觉环境的2 661个高度组合的人类手写自然语言语句。 我们用可执行的 Python 程序来批注所有声明,这些声明代表着它们的意义,以便在每一个可能的世界状态中进行精确的奖赏计算。 每个声明都配以多个起始状态和奖赏功能, 形成成千上万个不同的不同困难的Markov 决策程序。 我们用不同的模型和学习机制来试验 lilGym 。 我们的结果和分析表明, 虽然现有方法能够实现非边际性性性性表现, LilGym 形成一个挑战性的开放问题。 LilGym 可以在 https://lil.nlp.cornell.edu/lilgym/上查阅 https://lililgym.edu/ 。