The flocking motion control is concerned with managing the possible conflicts between local and team objectives of multi-agent systems. The overall control process guides the agents while monitoring the flock-cohesiveness and localization. The underlying mechanisms may degrade due to overlooking the unmodeled uncertainties associated with the flock dynamics and formation. On another side, the efficiencies of the various control designs rely on how quickly they can adapt to different dynamic situations in real-time. An online model-free policy iteration mechanism is developed here to guide a flock of agents to follow an independent command generator over a time-varying graph topology. The strength of connectivity between any two agents or the graph edge weight is decided using a position adjacency dependent function. An online recursive least squares approach is adopted to tune the guidance strategies without knowing the dynamics of the agents or those of the command generator. It is compared with another reinforcement learning approach from the literature which is based on a value iteration technique. The simulation results of the policy iteration mechanism revealed fast learning and convergence behaviors with less computational effort.
翻译:群集运动控制涉及管理多智能体系统的本地和团队目标之间可能发生的冲突。整个控制过程指导智能体同时监视群聚性和定位问题。潜在机制可能会因忽视与群体动力学和形态相关的未建模不确定性而退化。另一方面,各种控制设计的效率取决于它们能够多快地适应不同的动态情况。本文采用一种在线无模型策略迭代方法,用于引导一群智能体遵循独立的指令生成器在不断变化的图拓扑中移动。任何两个智能体之间或图的边缘权重的连通性强度是根据位置邻接性依赖函数来决定的。采用在线递归最小二乘方法来调整指导策略,而不必知道智能体或指令生成器的动力学。它与另一种基于值迭代技术的文献中的强化学习方法进行了比较。策略迭代机制的仿真结果显示出快速学习和收敛行为,计算量较少。