The study of electromagnetic detection satellite scheduling problem (EDSSP) has attracted attention due to the detection requirements for a large number of targets. This paper proposes a mixed-integer programming model for the EDSSP problem and a genetic algorithm based on reinforcement learning (RL-GA). Numerous factors that affect electromagnetic detection are considered in the model, such as detection mode, bandwidth, and other factors. The RL-GA embeds a Q-learning method into an improved genetic algorithm, and the evolution of each individual depends on the decision of the agent. Q-learning is used to guide the population search process by choosing evolution operators. In this way, the search information can be effectively used by the reinforcement learning method. In the algorithm, we design a reward function to update the Q value. According to the problem characteristics, a new combination of <state, action> is proposed. The RL-GA also uses an elite individual retention strategy to improve search performance. After that, a task time window selection algorithm (TTWSA) is proposed to evaluate the performance of population evolution. Several experiments are used to examine the scheduling effect of the proposed algorithm. Through the experimental verification of multiple instances, it can be seen that the RL-GA can solve the EDSSP problem effectively. Compared with the state-of-the-art algorithms, the RL-GA performs better in several aspects.
翻译:电磁探测卫星测距问题研究(EDSSP)引起了人们的注意,因为对大量目标的探测要求(EDSSP)引起了人们的注意。本文件提议了EDSSP问题混合整数编程模式和基于强化学习(RL-GA)的遗传算法。模型中考虑了许多影响电磁探测的因素,例如探测模式、带宽和其他因素。RL-GA将Q学习方法嵌入改进的遗传算法,每个人的演进取决于代理人的决定。Q-学习用于通过选择进化操作员来指导人口搜索进程。这样,搜索信息可以有效地用于强化学习方法。在算法中,我们设计了一个奖励功能来更新Q值。根据问题特点,提出了一种 < state,动作 > 的新组合。RL-GA还利用精英个人保留战略来提高搜索性能。此后,提议一个任务时间选择算法(TTWSA)来评估人口演化的绩效。若干实验用于审查拟议的变化算法的排程效果。在增强学习方法中,我们设计了一个奖励功能,通过实验性地检验了RSP-L的多种情况。通过测试,可以改善RDSA-RDS-R-R-SDS-S-S的状态的演算法。