Background and motivation: Deep Reinforcement Learning (Deep RL) is a rapidly developing field. Historically most application has been made to games (such as chess, Atari games, and go). Deep RL is now reaching the stage where it may offer value in real world problems, including optimisation of healthcare systems. One such problem is where to locate ambulances between calls in order to minimise time from emergency call to ambulance on-scene. This is known as the Ambulance Location problem. Aim: To develop an OpenAI Gym-compatible framework and simulation environment for testing Deep RL agents. Methods: A custom ambulance dispatch simulation environment was developed using OpenAI Gym and SimPy. Deep RL agents were built using PyTorch. The environment is a simplification of the real world, but allows control over the number of clusters of incident locations, number of possible dispatch locations, number of hospitals, and creating incidents that occur at different locations throughout each day. Results: A range of Deep RL agents based on Deep Q networks were tested in this custom environment. All reduced time to respond to emergency calls compared with random allocation to dispatch points. Bagging Noisy Duelling Deep Q networks gave the most consistence performance. All methods had a tendency to lose performance if trained for too long, and so agents were saved at their optimal performance (and tested on independent simulation runs). Conclusions: Deep RL agents, developed using simulated environments, have the potential to offer a novel approach to optimise the Ambulance Location problem. Creating open simulation environments should allow more rapid progress in this field.
翻译:背景和动机:深强化学习(Deep RL)是一个快速开发的字段。历史上,大多数应用都用于游戏(例如象棋、Atari游戏和去)。深RL现在到达了一个阶段,在现实世界问题中可能提供价值,包括医疗保健系统的优化。一个这样的问题就是如何在呼叫之间定位救护车,以尽量减少从紧急呼叫到现场救护车的时间。这被称为救护车位置问题。目标:为测试深RL代理商开发一个开放的、可比较的版本框架和模拟环境。方法:使用OpenAI Gym和SimPy开发了一个自定义救护车发送模拟环境。使用PyTorrch构建了深度RL代理商。环境是真实世界的简化,但允许控制事件发生地点、可能发送地点的数量、医院数量以及每天在不同地点制造事件。结果:在这个定制环境中,基于深Q网络的深RL代理商方法应该经过测试。 与随机配置的深度定位模拟环境相比,对于紧急情况的反应时间非常短,在快速配置的运行时间里程中,如果使用了最精细的运行规则,则使用最精细的运行方法。