Online selection of optimal waveforms for target tracking with active sensors has long been a problem of interest. Many conventional solutions utilize an estimation-theoretic interpretation, in which a waveform-specific Cram\'{e}r-Rao lower bound on measurement error is used to select the optimal waveform for each tracking step. However, this approach is only valid in the high SNR regime, and requires a rather restrictive set of assumptions regarding the target motion and measurement models. Further, due to computational concerns, many traditional approaches are limited to near-term, or myopic, optimization, even though radar scenes exhibit strong temporal correlation. More recently, reinforcement learning has been proposed for waveform selection, in which the problem is framed as a Markov decision process (MDP), allowing for long-term planning. However, a major limitation of reinforcement learning is that the memory length of the underlying Markov process is often unknown for realistic target and channel dynamics, and a more general framework is desirable. This work develops a universal sequential waveform selection scheme which asymptotically achieves Bellman optimality in any radar scene which can be modeled as a $U^{\text{th}}$ order Markov process for a finite, but unknown, integer $U$. Our approach is based on well-established tools from the field of universal source coding, where a stationary source is parsed into variable length phrases in order to build a context-tree, which is used as a probabalistic model for the scene's behavior. We show that an algorithm based on a multi-alphabet version of the Context-Tree Weighting (CTW) method can be used to optimally solve a broad class of waveform-agile tracking problems while making minimal assumptions about the environment's behavior.
翻译:与活跃传感器的目标跟踪相关的最佳波形在线选择长期以来一直是一个令人感兴趣的问题。 许多常规解决方案都使用了估算理论解释, 其中使用波形特定 Cram\'{e}r-Rao 对测量错误的较低约束值来选择每个跟踪步骤的最佳波形。 但是,这种方法只在高SNR 系统中有效, 并且要求对目标运动和测量模型进行一套相当限制性的假设。 此外, 由于计算上的考虑, 许多传统方法都局限于近期或近距离、 优化, 即使雷达场显示出强烈的时间相关性。 最近, 提出了用于选择波形选择的强化学习, 其中将问题标为马科夫决定程序( MDP ), 允许进行长期规划。 然而, 增强学习的一个主要限制是, 根基的马尔科夫进程的记忆长度对于目标、 频道动态和测量模型来说往往不为人所熟知。 这项工作开发了一种通用的波状选择方案, 在任何雷达场景上实现贝尔曼最优化, 可以建成一个以美元为正值的正值的轨道,, 将一个基于我们正值的直径直径的直径的轨道的轨道, 显示一个以正值的直径路路路路路的轨道 。