This paper studies the minimum-age scheduling problem in a wireless sensor network where an access point (AP) monitors the state of an object via a set of sensors. The freshness of the sensed state, measured by the age-of-information (AoI), varies at different sensors and is not directly observable to the AP. The AP has to decide which sensor to query/sample in order to get the most updated state information of the object (i.e., the state information with the minimum AoI). In this paper, we formulate the minimum-age scheduling problem as a multi-armed bandit problem with partially observable arms and explore the greedy policy to minimize the expected AoI sampled over an infinite horizon. To analyze the performance of the greedy policy, we 1) put forth a relaxed greedy policy that decouples the sampling processes of the arms, 2) formulate the sampling process of each arm as a partially observable Markov decision process (POMDP), and 3) derive the average sampled AoI under the relaxed greedy policy as a sum of the average AoI sampled from individual arms. Numerical and simulation results validate that the relaxed greedy policy is an excellent approximation to the greedy policy in terms of the expected AoI sampled over an infinite horizon.
翻译:本文研究无线传感器网络中的最小年龄排程问题,在这个网络中,一个接入点(AP)通过一组传感器监测物体状态。根据信息年龄(AoI)衡量的感知状态的新鲜程度在不同传感器上各不相同,并且无法直接观察AP。AP必须决定哪个传感器来查询/抽样,以便获得该物体的最新状态信息(即国家与最低AoI的资料)。在本文中,我们将最低年龄排程问题作为多武装匪盗用部分可部分可观察武器监测的多武装问题,并探索将预期的AoI抽样在无限地平地范围内进行最小化的贪婪政策。为了分析贪婪政策的执行情况,我们1)提出了宽松的贪婪政策,将武器取样过程分开,2 将每个手臂的取样过程作为部分可部分可观察的Markov决策过程(POMDP),3 根据宽松的贪婪政策,将平均抽样的AoI作为从个别武器中抽取的平均AoI政策的总和。Numerical和模拟结果证实,贪婪政策是贪婪政策的预期前景。