动态环境、简简环境和部分观测环境中信息增益传感器控制 (Sensor Control for Information Gain in Dynamic, Sparse and Partially Observed Environments)

We present an approach for autonomous sensor control for information gathering under partially observable, dynamic and sparsely sampled environments. We consider the problem of controlling a sensor that makes partial observations in some space of interest such that it maximizes information about entities present in that space. We describe our approach for the task of Radio-Frequency (RF) spectrum monitoring, where the goal is to search for and track unknown, dynamic signals in the environment. To this end, we develop and demonstrate enhancements of the Deep Anticipatory Network (DAN) Reinforcement Learning (RL) framework that uses prediction and information-gain rewards to learn information-maximization policies in reward-sparse environments. We also extend this problem to situations in which taking samples from the actual RF spectrum/field is limited and expensive, and propose a model-based version of the original RL algorithm that fine-tunes the controller using a model of the environment that is iteratively improved from limited samples taken from the RF field. Our approach was thoroughly validated by testing against baseline expert-designed controllers in simulated RF environments of different complexity, using different rewards schemes and evaluation metrics. The results show that our system outperforms the standard DAN architecture and is more flexible and robust than several hand-coded agents. We also show that our approach is adaptable to non-stationary environments where the agent has to learn to adapt to changes from the emitting sources.

翻译：我们提出了在部分可观测、动态和抽样稀少的环境中收集信息的自主传感器控制方法。我们考虑了控制传感器的问题,该传感器在某些感兴趣的空间进行部分观测,从而最大限度地增加关于该空间内实体的信息。我们描述了我们执行无线电-公平频谱监测任务的方法,目的是搜索和跟踪环境中未知的动态信号。为此,我们制定并展示了深预测网络强化(DAN)框架的增强,该框架利用预测和信息增益学习不同复杂程度的RF环境中的信息最大化政策,利用不同的奖励计划和评价指标,将这一问题扩大到从实际RF频谱/场采集样本有限和昂贵的情况,并提出基于原始RL算法的模型版本,该模型利用环境模型对控制者进行微调,该模型从来自RF字段的有限样本得到反复改进。我们的方法经过彻底验证,通过测试在模拟RF环境中的基线专家设计的控制者,利用不同的奖励计划和评价指标,将这一问题扩大到从实际RF频谱/场采集样本的情况。我们提出的原始RL算法的模型模型模型模型模型模型,从我们的标准和指数显示的是,我们的系统从一个更灵活、更灵活、更灵活、更灵活、更灵活的方法是更灵活地展示了我们的机构。