Reinforcement learning agents have demonstrated remarkable achievements in simulated environments. Data efficiency poses an impediment to carrying this success over to real environments. The design of data-efficient agents calls for a deeper understanding of information acquisition and representation. We develop concepts and establish a regret bound that together offer principled guidance. The bound sheds light on questions of what information to seek, how to seek that information, and it what information to retain. To illustrate concepts, we design simple agents that build on them and present computational results that demonstrate improvements in data efficiency.
翻译:强化学习机构在模拟环境中取得了显著成就,数据效率阻碍了将这一成功推广到真实环境。数据高效剂的设计要求更深入地了解信息获取和表述情况。我们制定概念,并建立一个共同提供原则性指导的遗憾纽带。这一界限揭示了需要寻求哪些信息、如何寻求这些信息和需要保留哪些信息的问题。为了说明概念,我们设计了简单的工具,以这些概念为基础,并提出了显示数据效率提高的计算结果。