The two-sided markets such as ride-sharing companies often involve a group of subjects who are making sequential decisions across time and/or location. With the rapid development of smart phones and internet of things, they have substantially transformed the transportation landscape of human beings. In this paper we consider large-scale fleet management in ride-sharing companies that involve multiple units in different areas receiving sequences of products (or treatments) over time. Major technical challenges, such as policy evaluation, arise in those studies because (i) spatial and temporal proximities induce interference between locations and times; and (ii) the large number of locations results in the curse of dimensionality. To address both challenges simultaneously, we introduce a multi-agent reinforcement learning (MARL) framework for carrying policy evaluation in these studies. We propose novel estimators for mean outcomes under different products that are consistent despite the high-dimensionality of state-action space. The proposed estimator works favorably in simulation experiments. We further illustrate our method using a real dataset obtained from a two-sided marketplace company to evaluate the effects of applying different subsidizing policies. A Python implementation of our proposed method is available at https://github.com/RunzheStat/CausalMARL.
翻译:二面市场,如拼车公司,经常涉及一组主体在时间和/或位置上做出连续决策。随着智能手机和物联网的飞速发展,它们已经在很大程度上改变了人类的交通领域。在本文中,我们考虑拼车公司的大规模车队管理,它涉及到不同区域的多个单位在时间上接收不同的产品(或治疗)序列。这些研究中存在主要的技术挑战,例如策略评估,因为(i)空间和时间接近性在位置和时间之间引起干扰;以及(ii)大量位置导致维数灾难。为了同时解决这两个挑战,我们引入了一个基于多智能体强化学习(MARL)的框架来进行这些研究中的策略评估。我们提出了新的估计量来估算不同产品下的平均结果,尽管状态-行为空间的维数很高,但这些估计量是一致的。所提出的估计器在模拟实验中表现良好。我们进一步使用从二面市场公司获得的真实数据说明了我们的方法,以评估应用不同补贴政策的效果。我们提供了一个基于Python的实现,可用于 https://github.com/RunzheStat/CausalMARL。