The last decade has seen an upswing in interest and adoption of reinforcement learning (RL) techniques, in large part due to its demonstrated capabilities at performing certain tasks at "super-human levels". This has incentivized the community to prioritize research that demonstrates RL agent performance, often at the expense of research aimed at understanding their learning dynamics. Performance-focused research runs the risk of overfitting on academic benchmarks -- thereby rendering them less useful -- which can make it difficult to transfer proposed techniques to novel problems. Further, it implicitly diminishes work that does not push the performance-frontier, but aims at improving our understanding of these techniques. This paper argues two points: (i) RL research should stop focusing solely on demonstrating agent capabilities, and focus more on advancing the science and understanding of reinforcement learning; and (ii) we need to be more precise on how our benchmarks map to the underlying mathematical formalisms. We use the popular Arcade Learning Environment (ALE; Bellemare et al., 2013) as an example of a benchmark that, despite being increasingly considered "saturated", can be effectively used for developing this understanding, and facilitating the deployment of RL techniques in impactful real-world problems.
翻译:过去十年间,强化学习(RL)技术的研究兴趣与应用显著增长,这主要得益于其在某些任务上展现出的“超人类水平”能力。这一趋势促使学术界优先关注展示RL智能体性能的研究,而往往忽视了旨在理解其学习动态的研究。以性能为中心的研究存在对学术基准过度拟合的风险——从而削弱其实际效用——这可能导致所提出的技术难以迁移至新问题。此外,这种倾向无形中贬低了那些虽未推动性能边界、却致力于增进我们对该技术理解的研究工作。本文提出两个论点:(i)RL研究应停止仅聚焦于展示智能体能力,而更应关注推进强化学习的科学认知与理论理解;(ii)我们需要更精确地阐明基准测试如何映射到底层数学形式化框架。我们以流行的街机学习环境(ALE;Bellemare等人,2013)为例,说明尽管该基准日益被视为“饱和”,它仍能有效用于深化理论理解,并促进RL技术在有影响力的现实问题中的部署。