Approximating Martingale Process (AMP) is proven to be effective for variance reduction in reinforcement learning (RL) in specific cases such as Multiclass Queueing Networks. However, in the already proven cases, the state space is relatively small and all possible state transitions can be iterated through. In this paper, we consider systems in which state space is large and have uncertainties when considering state transitions, thus making AMP a generalized variance-reduction method in RL. Specifically, we will investigate the application of AMP in ride-hailing systems like Uber, where Proximal Policy Optimization (PPO) is incorporated to optimize the policy of matching drivers and customers.
翻译:在多级排队网络等特定情况下,近似马丁加勒进程(AMP)已证明对减少强化学习差异(RL)是有效的,但在已经证实的案例中,国家空间相对较小,所有可能的州过渡都可以通过迭接。 在本文中,我们考虑到国家空间较大并在考虑州过渡时具有不确定性的系统,从而使AMP成为RL普遍减少差异的方法。 具体地说,我们将调查AMP在Uber等乘载系统的应用情况,如Uber系统,在Uber系统,将优化匹配驾驶员和客户的政策,将优化准政策优化政策(PPPO)。