We formulate offloading of computational tasks from a dynamic group of mobile agents (e.g., cars) as decentralized decision making among autonomous agents. We design an interaction mechanism that incentivizes such agents to align private and system goals by balancing between competition and cooperation. In the static case, the mechanism provably has Nash equilibria with optimal resource allocation. In a dynamic environment, this mechanism's requirement of complete information is impossible to achieve. For such environments, we propose a novel multi-agent online learning algorithm that learns with partial, delayed and noisy state information, thus greatly reducing information need. Our algorithm is also capable of learning from long-term and sparse reward signals with varying delay. Empirical results from the simulation of a V2X application confirm that through learning, agents with the learning algorithm significantly improve both system and individual performance, reducing up to 30% of offloading failure rate, communication overhead and load variation, increasing computation resource utilization and fairness. Results also confirm the algorithm's good convergence and generalization property in different environments.
翻译:我们设计了一个互动机制,通过平衡竞争与合作来激励这些代理机构调整私人和系统目标。在静态的情况下,这一机制可以使Nash平衡,并优化资源分配。在动态环境中,这一机制要求完整信息的要求是不可能实现的。对于这种环境,我们提出一种新的多试剂在线学习算法,以部分、延迟和吵闹的国家信息学习,从而大大减少信息需求。我们的算法还能够以不同时间从长期和稀少的奖励信号中学习。V2X应用模拟的经验性结果证实,通过学习,该算法使系统和个人业绩显著改善,将负载故障率、通信间接费用和负载变异化的30%降低,增加计算资源的利用率和公平性。结果还证实了算法在不同环境中的良好趋同和普遍化特性。