The age of information minimization problems has been extensively studied in Real-time monitoring applications frameworks. In this paper, we consider the problem of monitoring the states of unknown remote source that evolves according to a Markovian Process. A central scheduler decides at each time slot whether to schedule the source or not in order to receive the new status updates in such a way as to minimize the Mean Age of Incorrect Information (MAoII). When the scheduler knows the source parameters, we formulate the minimization problem as an MDP problem. Then, we prove that the optimal solution is a threshold-based policy. When the source's parameters are unknown, the problem's difficulty lies in finding a strategy with a good trade-off between exploitation and exploration. Indeed, we need to provide an algorithm implemented by the scheduler that jointly estimates the unknown parameters (exploration) and minimizes the MAoII (exploitation). However, considering our system model, we can only explore the source if the monitor decides to schedule it. Then, applying the greedy approach, we risk definitively stopping the exploration process in the case where at a specific time, we end up with an estimation of the Markovian source's parameters to which the corresponding optimal solution is never to transmit. In this case, we can no longer improve the estimation of our unknown parameters, which may significantly detract from the performance of the algorithm. For that, we develop a new learning algorithm that gives a good balance between exploration and exploitation to avoid this main problem. Then, we theoretically analyze the performance of our algorithm compared to a genie solution by proving that the regret bound at time T is log(T). Finally, we provide some numerical results to highlight the performance of our derived policy compared to the greedy approach.
翻译:在实时监测应用框架中,对信息最小化问题的年龄问题进行了广泛研究。 在本文中,我们考虑了监测根据Markovian进程演变的未知远程源状态的问题。 中央调度员在每个时间档决定是否安排源头,以便收到新的状态更新,从而尽可能减少错误信息的平均时代(MAoII ) 。 当调度员知道源参数时, 我们只能将最小化问题表述为MDP问题。 然后, 我们证明最佳解决方案是一种基于门槛的政策。 当来源参数未知时, 问题在于找到一种战略, 并且利用和探索之间达成良好的交易。 事实上, 我们需要提供由调度员执行的算法, 以联合估计未知参数( 解释) 并尽量减少错误信息的平均时代( MAoII ) 。 但是,考虑到我们的系统模型, 我们只能将最小化的问题当作一个 MDP 问题来研究源。 然后, 运用贪婪的方法, 我们有可能在特定的时间里, 最终停止勘探进程。 当来源参数未知的算法, 我们最后会得出一个对开发结果的精确度的精确度值值值值值值值值值值值值值值值。 。 最终, 我们从一个未知的算法将我们从一个未知的源值源值源值的精确到一个测试到一个错误化的数值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值值到一个比。