We consider the sequential decision-making problem of making proactive request assignment and rejection decisions for a profit-maximizing operator of an autonomous mobility on demand system. We formalize this problem as a Markov decision process and propose a novel combination of multi-agent Soft Actor-Critic and weighted bipartite matching to obtain an anticipative control policy. Thereby, we factorize the operator's otherwise intractable action space, but still obtain a globally coordinated decision. Experiments based on real-world taxi data show that our method outperforms state of the art benchmarks with respect to performance, stability, and computational tractability.
翻译:我们认为先发制人的要求分配和拒绝自动流动系统赢利最大化操作者的决定是一个顺序决策问题,我们将此问题正式化为马尔科夫决策程序,并提议将多剂Soft Actor-Critic-Critical和加权双方配对进行新颖的组合,以获得一种预期的控制政策。 因此,我们将操作者本来难以解决的行动空间考虑在内,但仍然得到全球协调的决定。 基于真实世界出租车数据的实验表明,我们的方法在业绩、稳定性和计算可容性方面超过了最先进的基准。