In multi-agent based traffic simulation, agents are always supposed to move following existing instructions, and mechanically and unnaturally imitate human behavior. The human drivers perform acceleration or deceleration irregularly all the time, which seems unnecessary in some conditions. For letting agents in traffic simulation behave more like humans and recognize other agents' behavior in complex conditions, we propose a unified mechanism for agents learn to decide various accelerations by using deep reinforcement learning based on a combination of regenerated visual images revealing some notable features, and numerical vectors containing some important data such as instantaneous speed. By handling batches of sequential data, agents are enabled to recognize surrounding agents' behavior and decide their own acceleration. In addition, we can generate a traffic flow behaving diversely to simulate the real traffic flow by using an architecture of fully decentralized training and fully centralized execution without violating Markov assumptions.
翻译:在基于多试剂的交通模拟中,代理商总是应该按照现有指示以及机械和非自然地模仿人类行为而移动。 人类驱动器总是不定期地加速或减速, 在某些条件下似乎没有必要。 为了让运输模拟代理商更像人类, 并承认其他代理商在复杂条件下的行为, 我们提议一个统一的机制, 代理商学习如何决定各种加速, 方法是在一系列重新生成的显示某些显著特征的视觉图像和包含某些重要数据( 如瞬时速度)的数字矢量的组合基础上, 使用深度强化学习方法。 通过处理一系列相继数据, 代理商能够识别周围的代理商行为并决定自己的加速度。 此外, 我们可以产生一种不同的交通流, 来模拟真实的交通流动, 使用完全分散的培训和完全集中的处决结构, 而不违反 Markov 假设 。