When should an online reinforcement learning-based frequency agile cognitive radar be expected to outperform a rule-based adaptive waveform selection strategy? We seek insight regarding this question by examining a dynamic spectrum access scenario, in which the radar wishes to transmit in the widest unoccupied bandwidth during each pulse repetition interval. Online learning is compared to a fixed rule-based sense-and-avoid strategy. We show that given a simple Markov channel model, the problem can be examined analytically for simple cases via stochastic dominance. Additionally, we show that for more realistic channel assumptions, learning-based approaches demonstrate greater ability to generalize. However, for short time-horizon problems that are well-specified, we find that machine learning approaches may perform poorly due to the inherent limitation of convergence time. We draw conclusions as to when learning-based approaches are expected to be beneficial and provide guidelines for future study.
 翻译:在线强化学习频率灵活认知雷达何时可望超过基于规则的适应波形选择战略?我们通过研究动态频谱访问方案寻求对这一问题的洞察力,即雷达希望在每个脉冲重复间隔期间在最无占用的带宽内传播;在线学习比作固定的基于规则的感知和避免战略;我们显示,鉴于一个简单的Markov频道模型,问题可以通过分析方式通过随机主控来分析简单案例。此外,我们表明,对于更现实的频道假设,基于学习的方法显示出更大的普及能力。然而,对于时间相近且非常明确的短期问题,我们发现机器学习方法可能因内在的趋同时间限制而表现不佳。我们得出结论,如果基于学习的方法预期有益,并为今后的研究提供指南。