Mobility systems often suffer from a high price of anarchy due to the uncontrolled behavior of selfish users. This may result in societal costs that are significantly higher compared to what could be achieved by a centralized system-optimal controller. Monetary tolling schemes can effectively align the behavior of selfish users with the system-optimum. Yet, they inevitably discriminate the population in terms of income. Artificial currencies were recently presented as an effective alternative that can achieve the same performance, whilst guaranteeing fairness among the population. However, those studies were based on behavioral models that may differ from practical implementations. This paper presents a data-driven approach to automatically adapt artificial-currency tolls within repetitive-game settings. We first consider a parallel-arc setting whereby users commute on a daily basis from a unique origin to a unique destination, choosing a route in exchange of an artificial-currency price or reward while accounting for the impact of the choices of the other users on travel discomfort. Second, we devise a model-based reinforcement learning controller that autonomously learns the optimal pricing policy by interacting with the proposed framework considering the closeness of the observed aggregate flows to a desired system-optimal distribution as a reward function. Our numerical results show that the proposed data-driven pricing scheme can effectively align the users' flows with the system optimum, significantly reducing the societal costs with respect to the uncontrolled flows (by about 15% and 25% depending on the scenario), and respond to environmental changes in a robust and efficient manner.
翻译:由于自私用户的无节制行为,流动系统往往遭受高价的无政府状态,因为自私用户的行为不受控制,造成高价的无政府状态,这可能导致社会成本大大高于中央系统最佳控制者所能实现的目标。货币收费计划可以有效地将自私用户的行为与系统最佳控制者的行为统一起来。然而,它们不可避免地在收入方面对人口进行歧视。最近,人工货币被作为有效的替代方案提出,可以取得同样的业绩,同时保证民众的公平性。然而,这些研究是基于行为模式的研究,可能不同于实际执行。本文件提出了在重复游戏环境中自动调整人工货币收费的由数据驱动的方法。我们首先考虑平行的临界环境环境环境设置,用户每天从一个独特的来源通勤,到一个独特的目的地,选择一种交换人工货币价格或报酬的途径,同时考虑其他用户选择的旅行选择的影响,同时能够实现同样的业绩,同时保证民众的公平性。我们设计了一个基于模型的强化学习控制者,通过与拟议框架互动来自主地学习最佳定价政策。我们首先考虑所观察到的总和总体流动是否接近一个理想的总量流向一个理想的系统最佳价格模式,我们所期望的系统最佳的系统最佳分配方式,然后衡量的系统最佳地反映的25号的汇率,然后衡量一个社会汇率的汇率分配。我们的拟议模式,可以大幅度地评估一个制度下,从而大幅度地评估一个社会汇率变化的汇率的汇率,从而显示一个有效的制度调整的汇率的汇率的汇率的汇率的汇率的汇率的汇率的汇率,从而显示一种报酬。