There hardly exists a general solver that is efficient for scheduling problems due to their diversity and complexity. In this study, we develop a two-stage framework, in which reinforcement learning (RL) and traditional operations research (OR) algorithms are combined together to efficiently deal with complex scheduling problems. The scheduling problem is solved in two stages, including a finite Markov decision process (MDP) and a mixed-integer programming process, respectively. This offers a novel and general paradigm that combines RL with OR approaches to solving scheduling problems, which leverages the respective strengths of RL and OR: The MDP narrows down the search space of the original problem through an RL method, while the mixed-integer programming process is settled by an OR algorithm. These two stages are performed iteratively and interactively until the termination criterion has been met. Under this idea, two implementation versions of the combination methods of RL and OR are put forward. The agile Earth observation satellite scheduling problem is selected as an example to demonstrate the effectiveness of the proposed scheduling framework and methods. The convergence and generalization capability of the methods are verified by the performance of training scenarios, while the efficiency and accuracy are tested in 50 untrained scenarios. The results show that the proposed algorithms could stably and efficiently obtain satisfactory scheduling schemes for agile Earth observation satellite scheduling problems. In addition, it can be found that RL-based optimization algorithms have stronger scalability than non-learning algorithms. This work reveals the advantage of combining reinforcement learning methods with heuristic methods or mathematical programming methods for solving complex combinatorial optimization problems.
翻译:在这项研究中,我们开发了一个两阶段框架,将强化学习(RL)和传统操作研究(OR)算法结合起来,以便有效地处理复杂的时间安排问题。时间安排问题分两个阶段解决,包括有限的马尔科夫决策程序和混合整数编程程序。这提供了一个新的和一般的范例,将RL与解决时间安排问题的方法结合起来,利用RL和OR的优势:MDP通过RL方法缩小原始问题的搜索空间,而混合整数编程程序则由OR算法解决。这两个阶段是迭接和互动的,直到终止标准达到为止。根据这个想法,提出了两个版本的RL和OR的混合整数编程方法。地球观测卫星的时间安排问题被选为范例,以显示拟议的更精细排程框架和方法的有效性。各种方法的趋同性和普遍化能力,通过对精良性递精度的变精度进行核实,同时对50个混合整数级编程的编程方法进行反复反复和互动的演练,从而能够以更精确的方法来测试。