Model-free Reinforcement Learning (RL) requires the ability to sample trajectories by taking actions in the original problem environment or a simulated version of it. Breakthroughs in the field of RL have been largely facilitated by the development of dedicated open source simulators with easy to use frameworks such as OpenAI Gym and its Atari environments. In this paper we propose to use the OpenAI Gym framework on discrete event time based Discrete Event Multi-Agent Simulation (DEMAS). We introduce a general technique to wrap a DEMAS simulator into the Gym framework. We expose the technique in detail and implement it using the simulator ABIDES as a base. We apply this work by specifically using the markets extension of ABIDES, ABIDES-Markets, and develop two benchmark financial markets OpenAI Gym environments for training daily investor and execution agents. As a result, these two environments describe classic financial problems with a complex interactive market behavior response to the experimental agent's action.
翻译:示范性强化学习(RL)要求有能力通过在原始问题环境或模拟版本中采取行动对轨迹进行取样。在RL领域,通过开发专用的开放源模拟器和易于使用的框架,如OpenAI Gym及其Atari环境,大大促进了RL领域的突破。在本文中,我们提议使用基于离散事件时间的开放AI Gym框架,以不同事件时间为基础,基于分解事件多动模拟(DEAS) 。我们引入了将DEMAS模拟器包在 Gym 框架中的一般技术。我们用模拟器ABIDES作为基础,详细暴露了该技术并加以应用。我们运用这项工作,具体利用ABIDES、ABIDES-Markets的市场扩展,并开发两个基准金融市场OpenAI Gym环境,用于培训日常投资者和执行代理人。结果,这两个环境描述了典型的金融问题,对实验代理人的行动作了复杂的交互市场行为反应。