We present an end-to-end framework for the Assignment Problem with multiple tasks mapped to a group of workers, using reinforcement learning while preserving many constraints. Tasks and workers have time constraints and there is a cost associated with assigning a worker to a task. Each worker can perform multiple tasks until it exhausts its allowed time units (capacity). We train a reinforcement learning agent to find near optimal solutions to the problem by minimizing total cost associated with the assignments while maintaining hard constraints. We use proximal policy optimization to optimize model parameters. The model generates a sequence of actions in real-time which correspond to task assignment to workers, without having to retrain for changes in the dynamic state of the environment. In our problem setting reward is computed as negative of the assignment cost. We also demonstrate our results on bin packing and capacitated vehicle routing problem, using the same framework. Our results outperform Google OR-Tools using MIP and CP-SAT solvers with large problem instances, in terms of solution quality and computation time.
翻译:我们为任务分配问题提出了一个端到端框架,为一组工人绘制了多种任务,利用强化学习,同时保留了许多限制;任务和工人有时间限制,指派工人执行一项任务的费用也很高;每个工人都可以完成多种任务,直到其允许的时间单位(能力)耗尽为止;我们训练一个强化学习机构,通过尽量减少与任务相关的总费用,同时保持严格的限制,找到近乎最佳的解决办法;我们使用最接近的政策优化来优化模型参数;该模型产生一系列实时行动,与工人的任务分配相对应,而不必为环境动态状态的变化进行再培训;在我们的问题中,确定奖赏作为分配费用的负值计算;我们还利用同样的框架,展示我们在垃圾包装和固定车辆路由问题上的结果;我们的结果在解决方案质量和计算时间方面,利用大型问题实例,超越谷歌或工具。