A fundamental problem for waveform-agile radar systems is that the true environment is unknown, and transmission policies which perform well for a particular tracking instance may be sub-optimal for another. Additionally, there is a limited time window for each target track, and the radar must learn an effective strategy from a sequence of measurements in a timely manner. This paper studies a Bayesian meta-learning model for radar waveform selection which seeks to learn an inductive bias to quickly optimize tracking performance across a class of radar scenes. We cast the waveform selection problem in the framework of sequential Bayesian inference, and introduce a contextual bandit variant of the recently proposed meta-Thompson Sampling algorithm, which learns an inductive bias in the form of a prior distribution. Each track is treated as an instance of a contextual bandit learning problem, coming from a task distribution. We show that the meta-learning process results in an appreciably faster learning, resulting in significantly fewer lost tracks than a conventional learning approach equipped with an uninformative prior.
翻译:对波形敏感雷达系统来说,一个根本的问题是,真实的环境并不为人所知,对于某个特定的跟踪实例来说,效果良好的传输政策可能是次优的。此外,每个目标轨道的时间窗口有限,雷达必须及时从一系列测量中学习有效的战略。本文研究的是巴耶斯的雷达波形选择元学习模型,该模型旨在学习一种感应偏差,以快速优化跟踪一系列雷达场景的性能。我们在Bayesian顺序推理的框架内投出了波形选择问题,并引入了最近提议的Met-Thompson抽样算法的背景强盗变体,该算法以先前分布的形式学习了一种感性偏差。每个轨道都被当作一个背景带学习问题的例子,来自任务分布。我们显示,元学习过程的结果是相当快的学习速度,导致的丢失轨道大大少于先前配备不具有信息规范的常规学习方法。