Adapool:利用无模式的深层强化学习和变化点探测的Diurnal-适应性舰队管理框架 (AdaPool: A Diurnal-Adaptive Fleet Management Framework using Model-Free Deep Reinforcement Learning and Change Point Detection)

This paper introduces an adaptive model-free deep reinforcement approach that can recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling. Deep Reinforcement Learning (RL) suffers from catastrophic forgetting due to being agnostic to the timescale of changes in the distribution of experiences. Although RL algorithms are guaranteed to converge to optimal policies in Markov decision processes (MDPs), this only holds in the presence of static environments. However, this assumption is very restrictive. In many real-world problems like ride-sharing, traffic control, etc., we are dealing with highly dynamic environments, where RL methods yield only sub-optimal decisions. To mitigate this problem in highly dynamic environments, we (1) adopt an online Dirichlet change point detection (ODCP) algorithm to detect the changes in the distribution of experiences, (2) develop a Deep Q Network (DQN) agent that is capable of recognizing diurnal patterns and making informed dispatching decisions according to the changes in the underlying environment. Rather than fixing patterns by time of week, the proposed approach automatically detects that the MDP has changed, and uses the results of the new model. In addition to the adaptation logic in dispatching, this paper also proposes a dynamic, demand-aware vehicle-passenger matching and route planning framework that dynamically generates optimal routes for each vehicle based on online demand, vehicle capacities, and locations. Evaluation on New York City Taxi public dataset shows the effectiveness of our approach in improving the fleet utilization, where less than 50% of the fleet are utilized to serve the demand of up to 90% of the requests, while maximizing profits and minimizing idle times.

翻译：本文引入了适应性、无模型深度强化方法,可以识别和适应搭车共享环境中与汽车共享环境中的双向模式。深强化学习(RL)由于对经验分布变化的时间尺度具有不可知性,因此被灾难性地遗忘了。虽然RL算法保证与Markov决策流程(MDPs)的最佳政策趋同,但这只存在于静态环境中。然而,这一假设是限制性很强的。在许多现实世界中,如搭车共享、交通控制等,我们所处理的是高度动态的环境,在那里,RL方法只能产生亚优性决定。为了在高度动态环境中缓解这一问题,深强化学习(RL)会遭遇灾难性的忘记,因为对经验分布变化的时空不可知。虽然Rellichlet 更改点检测(ODCP)算法保证了在Markov决策流程(MDPs)中与最佳政策趋同,但只有在静态环境存在的情况下,这只能维持。在基本环境的变化中做出知情的发送决定。在周内,拟议的方法不是固定模式,而是自动检测MDP已经变化过次最佳的次最佳的利用次最佳的利用。同时,并且利用了新动力路路路路段对新车的需求。在新车需求进行最慢的频率的频率上,同时,在最慢的频率上显示。在新的路路路路路路路段中,在新路路段中也显示。

相关内容

深度强化学习

关注 154

深度强化学习 (DRL) 是一种使用深度学习技术扩展传统强化学习方法的一种机器学习方法。传统强化学习方法的主要任务是使得主体根据从环境中获得的奖赏能够学习到最大化奖赏的行为。然而，传统无模型强化学习方法需要使用函数逼近技术使得主体能够学习出值函数或者策略。在这种情况下，深度学习强大的函数逼近能力自然成为了替代人工指定特征的最好手段并为性能更好的端到端学习的实现提供了可能。

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

实时强化学习《Real-Time Reinforcement Learning》S Ramstedt, C Pal [Mila, Element AI] (2019)

专知会员服务

13+阅读 · 2019年11月17日