Order dispatch is one of the central problems to ride-sharing platforms. Recently, value-based reinforcement learning algorithms have shown promising performance on this problem. However, in real-world applications, the non-stationarity of the demand-supply system poses challenges to re-utilizing data generated in different time periods to learn the value function. In this work, motivated by the fact that the relative relationship between the values of some states is largely stable across various environments, we propose a pattern transfer learning framework for value-based reinforcement learning in the order dispatch problem. Our method efficiently captures the value patterns by incorporating a concordance penalty. The superior performance of the proposed method is supported by experiments.
翻译:命令发送是共享平台的核心问题之一。 最近,基于价值的强化学习算法在这个问题上表现出了良好的表现。 但是,在现实世界的应用中,需求供给系统不固定对重新利用在不同时间段生成的数据以学习价值功能构成挑战。 在这项工作中,由于某些州的价值在各种环境中的相对关系基本稳定,我们提议了一个模式性转移学习框架,用于基于价值的强化学习,用于在订单发送问题上的基于价值的强化学习。我们的方法通过引入协调处罚有效地捕捉了价值模式。拟议方法的优异性得到了实验的支持。