强化学习研究中的形式化与实现鸿沟 (The Formalism-Implementation Gap in Reinforcement Learning Research)

The last decade has seen an upswing in interest and adoption of reinforcement learning (RL) techniques, in large part due to its demonstrated capabilities at performing certain tasks at "super-human levels". This has incentivized the community to prioritize research that demonstrates RL agent performance, often at the expense of research aimed at understanding their learning dynamics. Performance-focused research runs the risk of overfitting on academic benchmarks -- thereby rendering them less useful -- which can make it difficult to transfer proposed techniques to novel problems. Further, it implicitly diminishes work that does not push the performance-frontier, but aims at improving our understanding of these techniques. This paper argues two points: (i) RL research should stop focusing solely on demonstrating agent capabilities, and focus more on advancing the science and understanding of reinforcement learning; and (ii) we need to be more precise on how our benchmarks map to the underlying mathematical formalisms. We use the popular Arcade Learning Environment (ALE; Bellemare et al., 2013) as an example of a benchmark that, despite being increasingly considered "saturated", can be effectively used for developing this understanding, and facilitating the deployment of RL techniques in impactful real-world problems.

翻译：过去十年间，强化学习（RL）技术的研究兴趣与应用显著增长，这主要得益于其在某些任务上展现出的“超人类水平”能力。这一趋势促使学术界优先关注展示RL智能体性能的研究，而往往忽视了旨在理解其学习动态的研究。以性能为中心的研究存在对学术基准过度拟合的风险——从而削弱其实际效用——这可能导致所提出的技术难以迁移至新问题。此外，这种倾向无形中贬低了那些虽未推动性能边界、却致力于增进我们对该技术理解的研究工作。本文提出两个论点：（i）RL研究应停止仅聚焦于展示智能体能力，而更应关注推进强化学习的科学认知与理论理解；（ii）我们需要更精确地阐明基准测试如何映射到底层数学形式化框架。我们以流行的街机学习环境（ALE；Bellemare等人，2013）为例，说明尽管该基准日益被视为“饱和”，它仍能有效用于深化理论理解，并促进RL技术在有影响力的现实问题中的部署。