用于适应性雷达波形选择的受控背景强盗学习 (Constrained Contextual Bandit Learning for Adaptive Radar Waveform Selection)

A sequential decision process in which an adaptive radar system repeatedly interacts with a finite-state target channel is studied. The radar is capable of passively sensing the spectrum at regular intervals, which provides side information for the waveform selection process. The radar transmitter uses the sequence of spectrum observations as well as feedback from a collocated receiver to select waveforms which accurately estimate target parameters. It is shown that the waveform selection problem can be effectively addressed using a linear contextual bandit formulation in a manner that is both computationally feasible and sample efficient. Stochastic and adversarial linear contextual bandit models are introduced, allowing the radar to achieve effective performance in broad classes of physical environments. Simulations in a radar-communication coexistence scenario, as well as in an adversarial radar-jammer scenario, demonstrate that the proposed formulation provides a substantial improvement in target detection performance when Thompson Sampling and EXP3 algorithms are used to drive the waveform selection process. Further, it is shown that the harmful impacts of pulse-agile behavior on coherently processed radar data can be mitigated by adopting a time-varying constraint on the radar's waveform catalog.

翻译：雷达发射机使用频谱观测序列以及同一地点接收器的反馈来选择能够准确估计目标参数的波形; 显示波形选择问题可以用线性背景土匪配方有效解决,这种配方既在计算上可行,又具有样本效率; 采用托盘式和对抗性线性线性带状模型,使雷达能够在广泛的物理环境中取得有效性能; 雷达发射机使用频谱观测序列以及同一地点接收器的反馈来选择波形,以精确地估计目标参数; 显示波形选择问题可以用线性背景土匪配方来有效解决,在计算上既可行,又有效; 采用触摸式和对抗性线性线性线性线性带状模型,使雷达能够在广泛的物理环境中实现有效性能; 雷达通信共存情景模拟,以及在对抗性雷达-干扰式雷达阵列中显示,在使用汤普森取样法和EXP3算法驱动波形选择过程时,拟议的配方能大大改进目标探测性。此外,还显示,通过对雷达波形表式目录采用时间波动限制,可以减轻脉动行为对一致处理的雷达数据的有害影响。