智能人群:通过多机构强化学习进行移动式人群遥感 (IntelligentCrowd: Mobile Crowdsensing via Multi-Agent Reinforcement Learning)

The prosperity of smart mobile devices has made mobile crowdsensing (MCS) a promising paradigm for completing complex sensing and computation tasks. In the past, great efforts have been made on the design of incentive mechanisms and task allocation strategies from MCS platform's perspective to motivate mobile users' participation. However, in practice, MCS participants face many uncertainties coming from their sensing environment as well as other participants' strategies, and how do they interact with each other and make sensing decisions is not well understood. In this paper, we take MCS participants' perspective to derive an online sensing policy to maximize their payoffs via MCS participation. Specifically, we model the interactions of mobile users and sensing environments as a multi-agent Markov decision process. Each participant cannot observe others' decisions, but needs to decide her effort level in sensing tasks only based on local information, e.g., its own record of sensed signals' quality. To cope with the stochastic sensing environment, we develop an intelligent crowdsensing algorithm IntelligentCrowd by leveraging the power of multi-agent reinforcement learning (MARL). Our algorithm leads to the optimal sensing policy for each user to maximize the expected payoff against stochastic sensing environments, and can be implemented at individual participant's level in a distributed fashion. Numerical simulations demonstrate that IntelligentCrowd significantly improves users' payoffs in sequential MCS tasks under various sensing dynamics.

翻译：智能移动装置的繁荣使得移动人群监测(MCS)成为完成复杂感测和计算任务的有希望的范例。在过去,我们从监控监平台的角度,为设计奖励机制和任务分配战略做出了巨大努力,以激励移动用户的参与。然而,在实践中,监控监参与者面临来自其感测环境和其他参与者战略的许多不确定性,以及他们如何相互互动和作出感测决定并没有得到很好的理解。在本文件中,我们从监控监参与者的角度出发,制定在线感测政策,以通过多感测和计算参与最大限度地获得其收益。具体地说,我们把移动用户和感测环境的相互作用作为多剂Markov决策程序的模型。每个参与者无法观察其他人的决定,但需要根据当地信息来决定她在感测任务方面的努力水平,例如,它自己的感测信号质量记录。为了应对感测环境,我们通过利用多感测强化学习的力量,开发了明智的人群感测算器。我们的演算法导致每个用户的最佳感测政策,以便让每个时时的思动用户在所预期的感测环境中最大限度地提高成本。