We introduce a new mathematical model of multi-agent reinforcement learning, the Multi-Agent Informational Learning Processor "MAILP" model. The model is based on the notion that agents have policies for a certain amount of information, models how this information iteratively evolves and propagates through many agents. This model is very general, and the only meaningful assumption made is that learning for individual agents progressively slows over time.
翻译:我们引入了一个新的多剂强化学习数学模型 — — 多剂信息学习处理器“MAILP ” 模型。该模型基于这样的理念,即代理商对一定数量的信息有政策,这种信息通过许多代理商的迭接演化和传播模式。 这个模型非常笼统,唯一有意义的假设是个体代理商的学习逐渐放缓。