In large-scale systems there are fundamental challenges when centralised techniques are used for task allocation. The number of interactions is limited by resource constraints such as on computation, storage, and network communication. We can increase scalability by implementing the system as a distributed task-allocation system, sharing tasks across many agents. However, this also increases the resource cost of communications and synchronisation, and is difficult to scale. In this paper we present four algorithms to solve these problems. The combination of these algorithms enable each agent to improve their task allocation strategy through reinforcement learning, while changing how much they explore the system in response to how optimal they believe their current strategy is, given their past experience. We focus on distributed agent systems where the agents' behaviours are constrained by resource usage limits, limiting agents to local rather than system-wide knowledge. We evaluate these algorithms in a simulated environment where agents are given a task composed of multiple subtasks that must be allocated to other agents with differing capabilities, to then carry out those tasks. We also simulate real-life system effects such as networking instability. Our solution is shown to solve the task allocation problem to 6.7% of the theoretical optimal within the system configurations considered. It provides 5x better performance recovery over no-knowledge retention approaches when system connectivity is impacted, and is tested against systems up to 100 agents with less than a 9% impact on the algorithms' performance.
翻译:在大型系统中,当集中技术用于任务分配时,存在根本性的挑战。互动的数量受到计算、储存和网络通信等资源限制的限制。我们可以通过将系统作为分配任务分配系统加以实施来提高可扩缩性,并分担许多代理机构的任务。然而,这也增加了通信和同步化的资源成本,而且难以扩大规模。在本文件中,我们提出四种算法来解决这些问题。这些算法的结合使每个代理机构能够通过强化学习来改进任务分配战略,同时改变他们根据过去的经验,对系统进行多大的探索,以适应他们认为其当前战略的最佳程度。我们注重分布代理机构系统,这些代理机构的行为受到资源使用限制的限制,将代理机构限制在本地而不是全系统的知识中。我们在模拟环境中评估这些算法,代理机构的任务由多种子组成,必须分配给其他能力不同的代理机构,然后执行这些任务。我们还模拟现实生活系统的影响,如网络不稳定等。我们的解决方案显示,要解决任务分配问题的方法是6.7%的任务分配问题,要超过100 %的理论智能系统配置系统业绩。我们考虑的是,在100 %的恢复过程中,它提供比5个测试的系统业绩最优于5的系统业绩。